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probes. A double stranded probe may then be obtained by either synthesizing the 
complementary strand and hybridizing the strands together under appropriate conditions or 
by adding the complementary strand using DNA polymerase with an appropriate pj 
sequence. Such cDNA probes may be used in the design of oligonucleotide probes and 
primers for screening and cloning such genes, e.g., using well known PGR techniques, or, 
alternatively, may be used to detect the presence or absence of a PfEMPl gene in a cell. Such 
nucleic acids, or fragments may comprise part or all of the cDNA sequence that encodes the 
polypeptides of the present invention. Effective cDNA probes may comprise as few as 15 
consecutive nucleotides in die cDNA sequence, but will often comprise longer segments. 
Further, these probes may further comprise an additional nucleotide sequence, such as a 
transcriptional primer sequence for cloning, or a detectable group for easy identification and 
location of complementary sequences. 

cDNA or genomic libraries of various types may be screened for new alleles or 
related sequences using the above probes. The choice of cDNA libraries normally 
corresponds to tissue sources which are abundant in mRNA for me desired polypeptides. 
Phage libraries are normally preferred, e.g., *gtll, but plasmid or YAC libraries may also be 
used. Clones of a library are spread onto plates, transferred to a substrate for screening, 
denatured, and probed for the presence of the desired sequences. 

Jn a related aspect, the nucleic acids of the present invention also include the PCR 
product or RT-PCR product, produced using the above described primer probes. For 
example, primer probes derived from the nucleotide sequence shown, described &/or 
referenced herein (including incorporated by reference), may be used to amplify sequences 
from different malaria parasites, and in particular, different strains of P. falciparum. 

The nucleic acids of the present invention may be present in whole cells, cell lysates 
or in partially pure or substantially pure or isolated form. Such "substantially pure" or 
"isolated" forms of these nucleic acids generally refer to the nucleic acid separated from 
contaminants with which it is generally associated, e.g., lipids, proteins and other nucleic 
acids. The nucleic acids of the present invention will be greater than about 50% pure. 
Typically, the nucleic acids will be more than about 60% pure, more typically, from about 
75% to about 90% pure, and preferably, from about 95% to about 98% pure. 

The present invention also provides substantially similar nucleic acid sequences, 
allelic variations and natural or induced sequences of the above described nucleic acids, as 
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well as chemically modified and substituted nucleic acids, e.g., those which incorporate 
modified nucleotide bases or which incorporate a labeling group. In addition to comprising a 
segment which encodes a PfEMPl protein or fragment thereof, the nucleic acids of the 
present invention may also comprise a segment encoding a heterologous protein, such that 
the gene is expressed to produce the two proteins as a fusion protein, as substantially 
described above. 

In addition to their use as probes, the nucleic acids of the present invention may also 
be used in the preparation of the polypeptides of the present invention, as described above. 
DNA encoding the polypeptides of the present invention will typically be incorporated into 
DNA constructs capable of introduction to and expression in an in vitro cell culture. Often, 
the nucleic acids of the present invention may be used to produce a suitable recombinant host 
cell. 

Specifically, DNA constructs will be suitable for replication in a unicellular host, such 
as bacteria, e.g., K coli, viruses or yeast, but may also be intended for introduction into a 
cultured mammalian, plant, insect, or other eukaryotic cell lines, DNA constructs prepared 
for introduction into bacteria or yeast will typically include a replication system recognized 
by the host, the intended DNA segment encoding the desired polypeptide, transcriptional and 
translational initiation and termination regulatory sequences operably linked to the 
polypeptide encoding segment A DNA segment is operably linked when it is placed into a 
functional relationship with another DNA segment. For example, a promoter or enhancer is 
operably linked to a coding sequence if it stimulates the transcription of the sequence; DNA 
for a signal sequence is operably linked to DNA encoding a polypeptide if it is expressed as a 
preprotein that participates in the secretion of the polypeptide. Generally, DNA sequences 
that are operably linked are contiguous, and in the case of a signal sequence both contiguous 
and in reading phase. However, enhancers need not be contiguous with the coding sequences 
whose transcription they control. Linking is accomplished by ligation at convenient 
restriction sites or at adapters or linkers inserted in lieu thereof. The selection of an 
appropriate promoter sequence will generally depend upon the host cell selected for the 
expression of the DNA segment. 

Examples of suitable promoter sequences include prokaryotic, and eukaryotic 
promoters well known in the art. See, e.g., Sambrook et al., supra. The transcriptional 
regulatory sequences will typically include a heterologous enhancer or promoter which is 
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recognized by the host. The selection of an appropriate promoter will depend upon the host, 
but promoters such as the trp, lac and phage promoters, tRNA promoters and glycolytic 
enzyme promoters are known and available. See Sambrook et aL, supra. 

Conveniently available expression vectors which include the replication system and 
transcriptional and translation^ regulatory sequences together with the insertion site for the 
PfEMPl pol 

combinations of cell lines and expression vectors are described in Sambrook et aL, supra, and 
in Metzger et aL, Nature 334:31-36 (1988). 

The vectors containing the DNA segments of interest, e.g., those encoding 
polypeptides comprising a PfEMPl protein or fragments thereof, can be transferred into the 
host cell by well known methods, which may vary depending upon the type of host used. For 
example, calcium chloride transfection is commonly used for prokaryotic cells, whereas 
calcium phosphate treatment may be used for other hosts. See, Sambrook et aL, supra. The 
term "transfori 



IIRI 



I cell" as used herein, includes the progeny of originally transformed cells. 
Techniques for manipulation of nucleic acids which encode the polypeptides of the 
present invention, i.e., subcloning the nucleic acids into expression vectors, labeling probes, 
DNA hybridization and the like, are generally described in Sambrook, et aL, supra. In 
recombinant methods, generally the nucleic acid encoding a peptide of the present invention 
is first cloned or isolated in a form suitable for ligation into an expression vector. After 
ligation, the vectors containing the nucleic acids fragments or inserts are introduced into a 
suitable host cell, for the expression of the polypeptide of the invention. The polypeptides 
may then be purified or isolated from the host cells. Methods for the synthetic preparation of 
oligonucleotides are generally described in Gait, oKgonucleoude Synthesis: A Practical 
Approach, IRL Press (1990). 

There are various methods of isolating the nucleic acids which encode the 
polypeptides of the present invention. Typically, the DNA is isolated from a genomic or 
cDNA library using labeled oligonucleotide probes specific for sequences in the desired 
DNA. Restriction endonuclease digestion of genomic DNA or cDNA containing the 
appropriate genes can be used to isolate the DNA encoding the binding domains of these 
proteins. From the PfEMPl sequence given (as shown herein), a panel of restriction 
endonucleases can be constructed to give cleavage of the DNA in desired regions, i.e., to 
obtain segments which encode biologically active fragments of the PfEMPl protein. 
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Following restriction endonuclease digestion, DNA encoding the polypeptides of the present 
invention is identified by its ability to hybridize with a nucleic acid probe in, for example a 
Southern blot format These regions are then isolated using standard methods. See, e.g., 
Sambrook, et ah, supra. 

The polymerase chain reaction, or "PCR" can also be used to prepare nucleic acids 
which encode the polypeptides of the present invention. PCR technology is used to amplify 
nucleic acid sequences of the desired nucleic acid, e.g., the DNA which encodes die 
polypeptides of the invention, directly from mRNA, cDNA, or genomic or cDNA libraries. 

Appropriate primers and probes for amplifying the nucleic acids described herein, 
may be generated from analysis of the PfEMPl oligonucleotide sequence, such as those 
shown, described &/or referenced herein (including incorporated by reference) and Table 1. 
Briefly, oligonucleotide primers complementary to the two 31 borders of the DNA region to 
be amplified are synthesized. The PCR is then carried out using the two primers. See, e.g., 
PCR Protocols: A Guide to Methods and Applications (Ihnis, M, Geifand, D., Sninsky, J. and 
White, T., eds.) Academic Press (1990). Primers can be selected to amplify various sized 
segments from the PfEMPl oligonucleotide sequence. The primers may also contain a 
restriction site and additional bases to permit "in-frame" cloning of the insert into an 
appropriate expression vector, using the restriction sites present on the primers. 
Antibodies 

The nucleic acids and polypeptides of the present invention, or fragments thereof, are 
also useful in producing antibodies, either polyclonal or monoclonal. These antibodies are 
produced by immunizing an appropriate vertebrate host, e.g., rat, mouse, rabbit or goat, with 
a polypeptide of the invention, or its fragment, or plasmid DNA containing a nucleic acid of 
the invention, alone or in conjunction with an adjunct. Usually, two or more immunizations 
are involved, and a few days following the last injection, the blood or spleen of the host will 
be harvested. 

For production of polyclonal antibodies, an appropriate target immune system is 
selected, typically a mouse or rabbit, but also including goats, sheep, cows, guinea pigs, 
monkeys and rats. The substantially purified antigen or plasmid is presented to the immune 
system in a fashion determined by methods appropriate for the animal. These and other 
parameters are well known to immunologists. Typically, injections are given in the footpads, 
intramuscularly, intradermaliy or intraperitoneally. The immunoglobulins produced by the 
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host can be precipitated, isolated and purified by routine methods, including affinity 
purification. 

For monoclonal antibodies, appropriate animals will be selected and the desired 
immunization protocol followed. After the appropriate period of time, the spleens of these 
animals are excised and individual spleen cells are fused, typically, to immortalized myeloma 
cells under appropriate selection conditions. Thereafter, the cells are clonally separated and 
the supernatants of each clone are tested for the production of an appropriate antibody 
specific for the desired region of the antigen. Techniques for producing antibodies are well 
known in the art. See, e.g., Coding et al., Monoclonal Antibodies: Principles and Practice (2d 
ed.) Acad. Press, N.Y., and Harlow and Lane, Antibodies: A Laboratory Manual, Cold Spring 
Harbor Laboratory, New York (1988). Other suitable techniques involve the in vitro exposure 
of lymphocytes to the antigenic polypeptides or alternatively, to selection of libraries of 
antibodies in phage or similar vectors. Huse et al., Generation of Large Combinatorial 
Library of the Immunoglobulin Repertoire in Phage Lambda, Science 246:1275-1281 (1989). 
Monoclonal antibodies with affinities of 10 8 liters/mole, preferably 10 9 to 10 10 or stronger, 
will be produced by these methods. 

The antibodies generated can be used for a number of purposes, e.g., as probes in 
immunoassays, for inhibiting PfEMPl binding to its ligands, thereby inhibiting or reducing 
erythrocyte sequestration, in diagnostics or therapeutics, or in research to further elucidate 
the mechanism of various aspects of malarial infection, and particularly, P. falciparum 
infection. The antibodies of the present invention can be used with or without modification. 
Frequently, the antibodies will be labeled by joining, either covalently or non-covalently, a 
substance which provides for a detectable signal. Such labels include those that are well 
known in the art, such as the labels described previously for the polypeptides of the 
invention. Additionally, the antibodies of the invention may be chimeric, human-like or 
humanized, in order to reduce their potential antigenicity, without reducing their affinity for 
their target. Chimeric, human-like and humanized antibodies have generally been described 
in the art. Generally, such chimeric, human-like or humanized antibodies comprise variable 
regions, e.g., complementarity determining regions (CDR) (for humanized antibodies), from 
a mammalian animal, i.e., a mouse, and a human framework region. By incorporating as little 
foreign sequence as possible in the hybrid antibody, the antigenicity is reduced. Preparation 
of these hybrid antibodies may be carried out by methods well known in the art. 
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Preferred antibodies ate those that are specifically immunoreactive with the 
polypeptides of the present invention and their immunologically active fragments. The phrase 
"specifically immunoreactive," when referring to the interaction between an antibody of the 
invention and a particular protein, refers to an antibody that specifically recognizes and binds 
with relatively high affinity to the particular protein, such mat this binding is determinative 
of the presence of the protein in a heterogeneous population of proteins and other biologies. 
Thus, under designated immunoassay conditions, the specified antibodies bind to a particular 
protein and do not bind in a significant amount to other proteins present in the sample. A 
variety of immunoassay formats may be used to select antibodies specifically 
immunoreactive with a particular protein. For example, solid-phase EUSA immunoassays 
are routinely used to select monoclonal antibodies specifically immunoreactive with a 
protein. See Harlow and Lane (1988) Antibodies, A Laboratory Manual, Cold Spring Harbor 
Publications, New York, for a description of immunoassay formats and conditions that can be 
used to determine specific immunoreactivity. 

The antibodies generated can be used for a number of purposes, e.g., as probes in 
immunoassays, for inhibiting interaction between a PfEMPl protein and its ligand, e.g., CD- 
36, TSP, ICAM-1, VCAM-1, ELAM-1, or Chondroitin sulfate, thereby inhibiting or reducing 
the level of PfEMPl-ligand interaction, in diagnostics or therapeutics, or in research to 
further elucidate the mechanism of malarial pathology, e.g., erythrocyte sequestration. Where 
the antibodies are used to block or reverse the interaction between a polypeptide of the 
invention and an associating ligand or PE, the antibody will generally be referred to as a 
"blocking antibody." Preferred antibodies are those monoclonal or polyclonal antibodies 
which specifically recognize and bind the polypeptides of the invention. Accordingly, these 
preferred antibodies will specifically recognize and bind the polypeptides which have an 
amino acid sequence that is substantially homologous to the relevant amino acid sequence 
shown, described &/or referenced herein (including incorporated by reference), or 
immunologically active fragments thereof. Still more preferred are antibodies which are 
capable of forming an antibody-ligand complex with the relatively conserved polypeptide 
fragments of PfEMPl sequences, and are thereby capable of blocking an interaction of 
PfEMPl from a variety of P. falciparum strains, and PfEMPl ligands. 



METHODS OF USE 
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The polypeptides, antibodies, and nucleic acids of the present invention have a variety 
of important uses, including, but not limited to, diagnostic, screening, prophylactic, including 
vaccination, and therapeutic applications. 
Diagnostic Applications 

In a particularly preferred aspect, the present invention provides methods and 
reagents useful in detecting the presence of PfflMPl in a sample. These detection methods 
are particularly useful in diagnosing malarial infections in a patient. For example, in a 
particularly preferred aspect, the antibodies of the present invention may be used to assay for 
the presence or absence of PfEMPl in a sample. Immunoassay techniques for the detection 
of the particular antigen are very well known in the art. For a review of immunological and 
immunoassay procedures in general, see Basic and Clinical Immunology 7th Edition (D. 
Stites and A. Terr ed.) 1991. 

Moreover, the immunoassays of the present invention can be performed in any of 
several configurations, which are reviewed extensively in Enzyme Immunoassay, E.T. 
Maggio, ed., CRC Press, Boca Raton, Florida (1980); "Practice and Theory of Enzyme 
Immunoassays," P. Tijssen, Laboratory Techniques in Biochemistry and Molecular Biology, 
Elsevier Science Publishers B.V. Amsterdam (1985); and, Harlow and Lane, Antibodies, A 
Laboratory Manual, supra. Generally, these methods comprise contacting the antibody with a 
sample to be tested, and detecting any specific binding between the antibody and a protein 
within the sample. Typically, this will be in a blot format, e.g., western blot, or in an ELBA 
format. Methods of performing these assay formats are well known in the art See, e.g., Basic 
and Clinical Immunology, 7th ed. (D. Stites and A Terr, eds., 1991). 

Typically, these diagnostic methods comprise contacting a sample with an antibody to 
PfEMPl, as described herein, and determining whether the antibody binds to any portion of 
the sample. In the case of human diagnostic techniques, the sample may be a whole blood 
sample, or some fraction thereof, e.g. an erythrocyte containing sample. Generally, such 
diagnostic methods are well known in the art, and are described in the above described 
references. The immunoreactivity of the antibody with the sample, indicates the presence of 
PfEMPl in the sample, and, in the case of a sample derived from a patient, a possible 
malarial infection. 

Alternatively, labeled polypeptides of the present invention may be used as diagnostic 
reagents in detecting the presence or absence of antibodies to PfEMPl, in a patient The 
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presence of antibodies within a patient would be indicative that the patient had been exposed 
to a malaria parasite sufficiently to result in an antigenic response. 

Similarly, the nucleic acid probes of the invention may be used in a similar manner, 
i.e., to identify the presence in a sample of a DNA segment encoding a PfBMPl polypeptide, 
or as PGR or RT-PCR primers to amplify and then detect PffiMPl encoding nucleic acid 
segments. Such assays typically involve the immobilization of nucleic acids in the sample, 
followed by interrogation?? of the immobilized sequences with a chemically labeled 
oligonucleotide probe, as described herein. Hybridization of the probe to the immobilized 
sample indicates the presence of a DNA segment encoding PfEMPl, and thus, a malarial 
infection. As described above, assays may be further designed to indicate not only the 
presence of a Malarial parasite, but also indicate the strain of parasite present Although 
described in terms of an immobilized sample probed with a solution based oligonucleotide 
probe, a wide variety of assay conformations may be adopted, which conformations are 
generally well known in the art 
Screening Applications 

In one aspect, the present invention provides methods for screening compounds to 
determine whether or not the particular compound is an antagonist of a symptom of a 
malarial infection. In particular, the screening methods of the present invention can be used 
to determine whether a test compound is an antagonist of the sequestration of erythrocytes 
which is associated with P. falciparum malaria. More particularly, the screening methods can 
determine whether a compound is an antagonist of the PfEMPll/Hgand interaction. Ligands 
of PfEMPl generally include, e.g., CD36, TSP, ELAM-1, ICAM-1, VCAM-1 or Chondroitin 
sulfate. 

Generally, the screening methods of the present invention comprise contacting 
PfEMPl protein, or a fragment thereof, and/or ligand protein, with a compound which is to 
be screened ("test compound"). The level of PfEMPl/ligand complex formed may then be 
detected and compared to a control, e.g., in the absence of the test compound. A decrease in 
the level of PfEMPl/ligand interaction is indicative that the test compound is an antagonist of 
that interaction. 

A test compound may be a chemical compound, a mixture of chemical compounds, a 
biological macromolecule, or an extract made from biological materials, such as bacteria, 
phage, yeast, plants, fungi, animal cells or tissues. Test compounds are evaluated for potential 
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activity as antagonists of PfEMPl/ligand interaction by inclusion in the screening assays 
described herein. An "antagonist" refers to a compound which will diminish the level of 
PfEMPl/ligand interaction, over a control. 

It will often be desirable in the screening assays of the present invention, to provide 
one of the PfEMPl or ligand proteins immobilized on a solid support. Suitable solid supports 
include, e.g„ agarose, cellulose, dextran, Sephadex, Sepharose, carboxymethyl cellulose, 
polystyrene, filter paper, nitrocellulose, ion exchange resins, plastic films, glass beads, 
polyaminemethylvinylether maleic acid copolymer, amino acid copolymer, ethylene-maleic 
acid copolymer, nylon, silk, etc. The support may be in the form of, e.g., a test tube, 
microtiter plate, beads, test strips, flat surface, e.g., for blotting formats, or the like. The 
reaction of the PfEMPl polypeptide or its ligand with the particular solid support may be 
carried out by methods well known in the art, e.g., binding to an immobilized anti-PfEMPl 
antibody, or binding to prederivatized solid support. 

In addition to the foregoing, it may also be desirable to provide either the PfEMPl or 
its ligand linked to a suitable detectable group to make detection of binding of one protein to 
the other, simpler. Useful detectable groups, or labels, are generally well known in the art. 
For example, a detectable group may be a radiolabel, such as, 125 I, ^p or 35 S, or a fluorescent 
or chemiluminescent group. 

Alternatively, the detectable group may be a substrate, cofactor, inhibitor, affinity 
ligand, antibody binding epitope tag, or an enzyme which is capable of being assayed. 
Suitable enzymes include, e.g., horseradish peroxidase, luciferase, or another readily 
assayable enzymes. These enzyme groups may be attached to the PfEMPl polypeptide, or its 
ligand by chemical means or maybe expressed as a fusion protein, as already described. 

Generally, where one of the above proteins, e.g., the PfEMPl ligand, is immobilized 
on a solid support, the other protein, e.g., PfEMPl or its fragment, will be labeled with an 
appropriate detectable group. Assaying whether a compound is an antagonist of the 
interaction of the two proteins is then a matter of contacting the labeled PfEMPl polypeptide 
or fragment with the immobilized ligand, in the presence of the test compound, under 
conditions which allow specific binding of the two proteins. The amount of label bound to 
the solid support is compared to a control, where no test compound was added Where a test 
compound results in a reduction of the amount of label which binds to a solid support, that 
compound is an antagonist of the PfEMPl/ligand interaction. 
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Therapeutic And Prophylactic Applications 

in addition to the above described uses, the polypeptides of the present invention may 
also be used in therapeutic applications, for the treatment of human and/or non-human 
mammalian patients. The therapeutic uses of the polypeptides of the present invention 
include the treatment of symptoms of existing disorders, as well as prophylactic applications. 
The term "prophylactic" refers to the prevention of a particular disorder, or symptoms of a 
particular disorder. Thus, prophylactic treatments will generally include drugs which actively 
participate in the prevention of a particular disorder such as a malaria infection, or symptoms 
thereof. Prophylactic applications will also include treatments which elicit a preventative 
response from a patient, including, for example, an immunological response as in the case of 
vaccination. 

Topically, both therapeutic and prophylactic applications will comprise administering 
an effective amount of the compositions of the present invention to a patient, to treat or 
prevent symptoms, or the onset of a malarial parasite infection. An "effective amount", as the 
term is used herein, is defined as the amount of the composition which is necessary to 
achieve the desired goal, i.e. alleviation of symptoms, prevention of symptoms or infection, 
or treatment of disease. 

In prophylactic applications, the polypeptides of the present invention may be used in 
a variety of treatments. For example, the polypeptides of the invention are particularly useful 
as a vaccine, to elicit an immunological response by a patient, e.g., production of antibodies 
specific for PfEMPL hi particular, such vaccine applications generally involve the 
administration of the PfEMPl protein or biologically active fragments thereof, to the host or 
patient 

In response to this administration, the patient's immune system will generate 
antibodies to the particular PfEMPl protein or fragment introduced. An amount of the 
polypeptides sufficient to produce an immunological response in a patient is termed "an 
immunogenically effective amount." Thus, the vaccines of the present invention will contain 
an immunogenically effective amount of the polypeptides of the present invention. The 
immune response of the patient may include generation of antibodies, activation of cytotoxic 
T- lymphocytes against cells expressing the polypeptides, e.g., PE, or other mechanisms 
known to the skilled artisan. See, e.g., Paul, Fundamental Immunology, 2d Edition, Raven 
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Press. Useful carriers are well known in the art, and include for example, thyroglobulin, 
albumins such as human serum albumin, tetanus toxoid, polyamino acids such as poly(D- 
lysine; D- glutamic acid), influenza, hepatitis B virus core protein, hepatitis B virus 
recombinant vaccine. The vaccines can also contain a physiologically tolerable diluent, such 
as water, buffered water, buffered saline, saline and typically may further include an 
adjuvant, such as incomplete Freunds adjuvant, aluminum phosphate, aluminum hydroxide, 
alum, or other materials well known in the art. 

Alternatively, the nucleic acids of the present invention may also be used as vaccines 
for the prevention of malaria symptoms, and/or infection by malaria parasites. See Sedegah, 
et al. Proc. Natl Acad. Sci. (1994) 91:9866-9870. 

For example, plasmid DNA comprising the nucleic acids of the present invention may 
be directly administered to a patient Expression of this "naked" DNA will have effects 
similar to the injection of. the actual polypeptides, as described above. Specifically, the 
patient's immune response to the presence of the proteins expressed from the DNA, will 
result in the production of antibodies to that protein . The nucleic acids may also be used to 
design antisense probes to interrupt transcription of PfBMPl peptides in parasitized 
erythocytes. 

Antisense methods are generally well known in the art. The polypeptides of the 
present invention, and analogs thereof, may also be used as prophylactic treatments to 
prevent the onset of symptoms of malarial infection. For example, administration of the 
polypeptides can directly inhibit, block or reverse the sequestration of erythrocytes in 
patients suffering from P. falciparum malaria infections. In particular, the polypeptides of the 
invention may be used to compete with or displace PE associated PfEMPl in binding CD36. 

The blockage or reversal of sequestration will reduce or eliminate the microvascular 
occlusion generally associated with the pathology of this type of malaria, which, again, can 
lead to destruction of the PE by the host. The antibodies of the invention may also be used in 
a similar fashion. In particular, the antibodies, which are capable of binding the polypeptides 
of the present invention, may be directly administered to a patient. By binding PfEMPl, the 
antibodies of the present invention are effective in blocking, reducing or reversing PfEMPl 
mediated interactions, e.g„ erythrocyte sequestration. Chimeric, human-like or humanized 
antibodies are particularly useful for administration to human patients. Additionally, such 
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antibodies may also be used as a passive vaccination method to provide a subject with a short 
term immunization, much as anti-hepatitis A injections have been used previously. 

In alternative aspects, the polypeptides, antibodies and nucleic acids of the invention 
may be used to treat a patient already suffering from a malarial infection. In particular, the 
compositions of the present invention may be administered to a patient suffering from a 
malarial infection to treat symptoms associated with that infection. More particularly, these 
compositions may be administered to the patient to prevent or reduce erythrocyte 
sequestration and the resulting microvascular occlusion associated with malarial, and more 
specifically, P. falciparum, infections. 

Although the polypeptides, nucleic acids and antibodies of the present invention may 
be administered alone, for therapeutic and prophylactic applications, these elements will 
generally be administered as part of a pharmaceutical composition, e.g., in combination with 
a pharmaceutically acceptable carrier. Typically, a single composition may be used in both 
therapeutic and prophylactic applications. Pharmaceutical formulations suitable for use in 
the present invention are generally described in Remington's Pharmaceutical Sciences, Mack 
Publishing Co., 17th ed. (1985). 

The pharmaceutical compositions of the present invention are intended for parenteral, 
topical, oral, or local administration. Where the pharmaceutical compositions are 
administered parenterally, the invention provides pharmaceutical compositions that comprise 
a solution of the agents described above, e.g., polypeptides of the invention, dissolved or 
suspended in a pharmaceutically acceptable carrier, preferably an aqueous carrier. A variety 
of aqueous carriers may be used, e.g., water, buffered water, saline glycine, and the like. 
These compositions may be sterilized by conventional, well known methods, e.g., sterile 
filtration. The resulting aqueous solutions may be packaged for use as is, or lyophilized for 
combination with a sterile solution prior to administration. Hie compositions may contain 
pharmaceutically acceptable auxiliary substances as required to approximate physiological 
conditions, such as pH adjusting and buffering agents, tonicity adjusting agents, wetting 
agents, and the like, for example sodium acetate, sodium lactate, sodium chloride, potassium 
chloride, calcium chloride, sorbitan monolaurate, triethanolamine oleate,.etc. 

For solid compositions, conventional nontoxic solid carriers may be used which 
include, for example, pharmaceutical grades of mannitol, lactose starch, magnesium stearate, 
sodium saccharin, talcum, cellulose, glucose, sucrose, magnesium carbonate, and the like. 
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For oral administration, a pharmaceutically acceptable nontoxic composition may be formed 
by incorporating any of the normally employed excipients, such as the previously listed 
carriers, and generally, 10-95% of active ingredient, and more preferably 25-75% active 
ingredient. In addition, for oral administration of peptide based compounds, the 
pharmaceutical compositions may include the active ingredient as part of a matrix to prevent 
proteolytic degradation of the active ingredient by digestive process, e.g., by providing the 
pharmaceutical composition within a liposomal composition, according to methods well 
known in the art. See, e.g., Remington's Pharmaceutical Sciences, Mack Publishing Co., 17th 
Ed. (1985). 

For aerosol administration, the polypeptides are generally supplied in finely divided 
form along with a surfactant or propellant. Preferably, the surfactant will be soluble in the 
propellant. Representative of such agents are the esters or partial esters of fatty acids 
containing from 6 to 22 carbon atoms, such as caproic, octanoic, lauric, palmitic, stearic, 
linoleic, linolenic, olesteric and oleic acids, with an aliphatic polyhydric alcohol or its cyclic 
anhydride. Mixed esters, such as mixed or natural glycerides may be employed. A carrier can 
also be, included, as desired, as with, e.g., lecithin for intranasal delivery. The above 
described compositions are suitable for a single administration or a series of administrations. 

« 

When given as a series, e.g., as a vaccine booster, the inoculations subsequent to the initial 
administration are given to boost the immune response, and are typically referred to as 
booster inoculations. 

The amount of the above compositions to be administered to the patient will vary 
depending upon what is to be administered to the patient, the state of the patient, the manner 
of administration, and the particular application, e.g., therapeutic or prophylactic. In 
therapeutic applications, the compositions are administered to the patient already suffering 
from a malarial infection, in an amount sufficient to inhibit the spread of the parasite through 
the erythrocytes,.and thereby cure or at least partially arrest the symptoms of the disease and 
its associated complications. 

An amount adequate to accomplish this is termed "a therapeutically effective 
amount" Amounts effective for this use will depend upon the severity of the disease and the 
weight and general state of the patient, but win generally be in the range of from about 1 mg 
to about 5 g of active agent per day, preferably from about 50 mg per day to about 500 mg 
per day, and more preferably, from about 50 mg to about 100 mg per day, for a 70 kg patient. 
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For prophylactic applications, immunogenically effective amounts will also depend 
upon the composition, the manner of administration and the weight and general state of the 
patient, as well as the judgment of the prescribing physician. For the peptide, peptide analog 
and antibody based pharmaceutical compositions, the general range for the initial 
immunization (for either prophylactic or therapeutic applications) will be from about 100#g 
to about 1 g of polypeptide for a 70 kg patient, followed by boosting dosages of from about 1 
fig to about 1 gm of polypeptide pursuant to a boosting regimen over weeks to months, 
depending upon the patient's response and condition, e.g., by measuring the level of parasite 
or antibodies in the patient's blood. For nucleic acids, typically from about 30 to about lOOpg 
of nucleic acid is injected into a 70 kg patient, more typically, about 50 to 150/ig of nucleic 
acid is injected, followed by boosting treatments as appropriate. 

DIRECTED EVOLUTION METHODS 

In one aspect the invention described herein is directed to the use of repeated cycles 
of reductive reassortment, recombination and selection which allow for the directed 
molecular evolution of highly complex linear sequences, such as DNA, RNA or proteins 
thorough recombination. 

In vivo shuffling of molecules can be performed utilizing the natural property of cells 
to recombine multimers. While recombination in vivo has provided the major natural route 
to molecular diversity, genetic recombination remains a relatively complex process that 
involves 1) the recognition of homologies; 2) strand cleavage, strand invasion, and metabolic 
steps leading to the production of recombinant chiasma; and finally 3) the resolution of 
chiasma into discrete recombined molecules. The formation of the chiasma requires the 
recognition of homologous sequences. 

In a preferred embodiment, the invention relates to a method for producing a hybrid 
polynucleotide from at least a first polynucleotide and a second polynucleotide. The present 
invention can be used to produce a hybrid polynucleotide by introducing at least a first 
polynucleotide and a second polynucleotide which share at least one region of partial 
sequence homology into a suitable host cell. The regions of partial sequence homology 
promote processes which result in sequence reorganization producing a hybrid 
polynucleotide. The term "hybrid polynucleotide", as used herein, is any nucleotide 
sequence which results from the method of the present invention and contains sequence from 
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at least two original polynucleotide sequences. Such hybrid polynucleotides can result from 
intermodular recombination events which promote sequence integration between DNA 
molecules. In addition, such hybrid polynucleotides can result from intramolecular reductive 
reassortment processes which utilize repeated sequences to alter a nucleotide sequence within 
5 a DNA molecule. 

The invention provides a means for generating hybrid polynucleotides which may 
encode biologically active hybrid polypeptides, fct one aspect, the original polynucleotides 
encode biologically active polypeptides. The method of the invention produces new hybrid 
polypeptides by utilizing cellular processes which integrate the sequence of the original 

10 polynucleotides such that the resulting hybrid polynucleotide encodes a polypeptide 
demonstrating activities derived from the original biologically active polypeptides. For 
example, the original polynucleotides may encode a particular enzyme from different 
microorganisms. An enzyme encoded by a first polynucleotide from one organism may, for 
example, function effectively under a particular environmental condition, e.g. high salinity. 

15 An enzyme encoded by a second polynucleotide from a different organism may function 
effectively under a different environmental condition, such as extremely high temperatures. 
A hybrid polynucleotide containing sequences from the first and second original 
polynucleotides may encode an enzyme which exhibits characteristics of both enzymes 
encoded by the original polynucleotides. Thus, the enzyme encoded by the hybrid 

20 polynucleotide may function effectively under environmental conditions shared by each of 
the enzymes encoded by the first and second polynucleotides, e.g., high salinity and extreme 
temperatures. 

Enzymes encoded by the original polynucleotides of the invention include, but are not 
limited to; oxidoreductases, transferases, hydrolases, lyases, isomerases and ligases. A 

25 hybrid polypeptide resulting from the method of the invention may exhibit specialized 
enzyme activity not displayed in the original enzymes. For example, following 
recombination and/or reductive reassortment of polynucleotides encoding hydrolase 
activities, the resulting hybrid polypeptide encoded by a hybrid polynucleotide can be 
screened for specialized hydrolase activities obtained from each of fee original enzymes, i.e. 

30 the type of bond on which the hydrolase acts and the temperature at which the hydrolase 
functions. Thus, for example, the hydrolase may be screened to ascertain those chemical 
functionalities which distinguish the hybrid hydrolase from the original hydrolyases, such as: 
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(a) amide (peptide bonds), i.e. proteases; (b) ester bonds, i.e. esterases and lipases; (c) 
acetals, i.e., glycosidases and, for example, the temperature, pH or salt concentration at 
which the hybrid polypeptide functions. 

Sources of the original polynucleotides may be isolated from individual organisms 
5 ("isolates"), collections of organisms that have been grown in defined media ("enrichment 
cultures"), or, most preferably, uncultivated organisms ("environmental samples"). The use 
of a culture-independent approach to derive polynucleotides encoding novel bioactivities 
from environmental samples is most preferable since it allows one to access untapped 
resources of biodiversity. 

10 "Environmental libraries" are generated from environmental samples and represent 

the collective genomes of naturally occurring organisms archived in cloning vectors that can 
be propagated in suitable prokaryotic hosts. Because the cloned DNA is initially extracted 
directly from environmental samples, the libraries are not limited to the small fraction of 
prokaryotes that can be grown in pure culture. Additionally, a normalization of the 

15 environmental DNA present in these samples could allow more equal representation of the 
DNA from all of the species present in the original sample. This can dramatically increase 
the efficiency of finding interesting genes from minor constituents of the sample which may 
be under-represented by several orders of magnitude compared to the dominant species. 

For example, gene libraries generated from one or more uncultivated microorganisms 

20 are screened for an activity of interest. Potential pathways encoding bioactive molecules of 
interest are first captured in prokaryotic cells in the form of gene expression libraries. 
Polynucleotides encoding activities of interest are isolated from such libraries and introduced 
into a host cell. The host cell is grown under conditions which promote recombination 
and/or reductive reassortment creating potentially active biomolecules with novel or 

25 enhanced activities. 

The microorganisms from which the polynucleotide may be prepared include 
prokaryotic microorganisms, such as Eubacteria and Archaebacteria, and lower eukaryotic 
microorganisms such as fungi, some algae and protozoa. Polynucleotides may be isolated 
from environmental samples in which case the nucleic acid may be recovered without 

30 culturing of an organism or recovered from one or more cultured organisms. In one aspect, 
such microorganisms may be extremophiles, such as hyperthermophiles, psychrophiles, 
psychrotrophs, halophiles, barophiles and acidophiles. Polynucleotides encoding enzymes 
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isolated from extremophilic microorganisms are particularly preferred. Such enzymes may 
function at temperatures above 100°C in terrestrial hot springs and deep sea thermal vents, at 
temperatures below 0°C in arctic waters, in the saturated salt environment of the Dead Sea, at 
pH values around 0 in coal deposits and geothermal sulfur-rich springs, or at pH values 
5 greater than 11 in sewage sludge. For example, several esterases and lipases cloned and 
expressed from extremophilic organisms show high activity throughout a wide range of 
temperatures and pHs. 

Polynucleotides selected and isolated as hereinabove described are introduced into a 
suitable host cell. A suitable host cell is any cell which is capable of promoting 

10 recombination and/or reductive leassortment. The selected polynucleotides are preferably 
already in a vector which includes appropriate control sequences. The host cell can be a 
higher eukaryotic cell, such as a mammalian cell, or a lower eukaryotic cell, such as a yeast 
cell, or preferably, the host cell can be a prokaryotic cell, such as a bacterial cell. Introduction 
of the construct into the host cell can be effected by calcium phosphate transfection, DEAE- 

15 Dextran mediated transfection, or electroporation (Davis et al, 1986). 

As representative examples of appropriate hosts, there may be mentioned: bacterial 
cells, such as E. coli 9 Streptomyces, Salmonella typhimurium; fungal cells, such as yeast; 
insect cells such as Drosophila S2 and Spodoptera SJ9; animal cells such as CHO, COS or 
Bowes melanoma; adenoviruses; and plant cells. The selection of an appropriate host is 

20 deemed to be within the scope of those skilled in the art from the teachings herein. 

With particular references to various mammalian cell culture systems that can be 
employed to express recombinant protein, examples of mammalian expression systems 
include the COS-7 lines of monkey kidney fibroblasts, described in "SV40-transformed 
simian cells support the replication of early SV40 mutants" (Gluzman, 1981), and other cell 

25 lines capable of expressing a compatible vector, for example, the C127, 3T3, CHO, HeLa 
and BHK cell lines. Mammalian expression vectors will comprise an origin of replication, a 
suitable promoter and enhancer, and also any necessary ribosome binding sites, 
polyadenylation site, splice donor and acceptor sites, transcriptional termination sequences, 
and 5 1 flanking nontranscribed sequences. DNA sequences derived from the SV40 splice, 

30 and polyadenylation sites may be used to provide the required nontranscribed genetic 
elements. 
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Host cells containing the polynucleotides of interest can be cultured in conventional 
nutrient media modified as appropriate for activating promoters, selecting transformants or 
amplifying genes. The culture conditions, such as temperature, pH and the like, are those 
previously used with the host cell selected for expression, and will be apparent to the 
ordinarily skilled artisan. The clones which are identified as having the specified enzyme 
activity may then be sequenced to identify the polynucleotide sequence encoding an enzyme 
having the enhanced activity. 

In another aspect, it is envisioned the method of the present invention can be used to 
generate novel polynucleotides encoding biochemical pathways from one or more operons or 
gene clusters or portions thereof. For example, bacteria and many eukaryotes have a 
coordinated mechanism for regulating genes whose products are involved in related 
processes. The genes are clustered, in structures referred to as "gene clusters," on a single 
chromosome and are transcribed together under the control of a single regulatory sequence, 
including a single promoter which initiates transcription of the entire cluster. Thus, a gene 
cluster is a group of adjacent genes that are either identical or related, usually as to their 
function. An example of a biochemical pathway encoded by gene dusters are polyketides. 
Polyketides are molecules which are an extremely rich source of bioactivities, including 
antibiotics (such as tetracyclines and erythromycin), anti-cancer agents (daunomycin), 
immunosuppressants (FK506 and rapamycin), and veterinary products (monensin). Many 
polyketides (produced by polyketide synthases) are valuable as therapeutic agents. 
Polyketide synthases are multifunctional enzymes that catalyze the biosynthesis of an 
enormous variety of carbon chains differing in length and patterns of functionality and 
cyclization. Polyketide synthase genes fall into gene clusters and at least one type 
(designated type I) of polyketide synthases have large size genes and enzymes, complicating 
genetic manipulation and in vitro studies of these genes/proteins. 

The ability to select and combine desired components from a library of polyketides, 
or fragments thereof, and postpolyketide biosynthesis genes for generation of novel 
polyketides for study is appealing. The method of the present invention makes it possible to 
facilitate the production of novel polyketide synthases through intermolecular recombination. 

Preferably, gene cluster DNA can be isolated from different organisms and ligated 

into vectors, particularly vectors containing expression regulatory sequences which can 

control and regulate the production of a detectable protein or protein-related airay activity 
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from the ligated gene clusters. Use of vectors which have an exceptionally large capacity for 
exogenous DNA introduction are particularly appropriate for use with such gene clusters and 
are described by way of example herein to include the f-f actor (or fertility factor) of E. coll 
This f-factor of E. coli is a plasmid which affect high-frequency transfer of itself during 
conjugation and is ideal to achieve and stably propagate large DNA fragments, such as gene 
clusters from mixed microbial samples. Once ligated into an appropriate vector, two or more 
vectors containing different polyketide synthase gene clusters can be introduced into a 
suitable host cell. Regions of partial sequence homology shared by the gene clusters will 
promote processes which result in sequence reorganization resulting in a hybrid gene cluster. 
The novel hybrid gene cluster can then be screened for enhanced activities not found in the 
original gene clusters. 

Therefore, in a preferred embodiment, the present invention relates to a method for 
producing a biologically active hybrid polypeptide and screening such a polypeptide for 
enhanced activity by: 

1) introducing at least a first polynucleotide in operable linkage and a second 
polynucleotide in operable linkage, said at least first polynucleotide and second 
polynucleotide sharing at least one region of partial sequence homology, into a 
suitable host cell; 

2) growing the host cell under conditions which promote sequence reorganization 
resulting in a hybrid polynucleotide in operable linkage; 

3) expressing a hybrid polypeptide encoded by the hybrid polynucleotide; 

4) screening the hybrid polypeptide under conditions which promote identification 
of enhanced biological activity; and 

5) isolating the a polynucleotide encoding the hybrid polypeptide. 

Methods for screening for various enzyme activities are known to those of skill in the 
art and discussed throughout the present specification. Such methods may be employed 
when isolating the polypeptides and polynucleotides of the present invention. 

As representative examples of expression vectors which may be used there may be 
mentioned viral particles, baculovirus, phage, plasmids, phagemids, cosmids, fosmids, 
bacterial artificial chromosomes, viral DNA (e.g. vaccinia, adenovirus, foul pox virus, 
pseudorabies and derivatives of SV40), Pl-based artificial chromosomes, yeast plasmids, 
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yeast artificial chromosomes, and any other vectors specific for specific hosts of interest 
(such as bacillus, aspergillus and yeast). Thus, for example, the DNA may be included in 
any one of a variety of expression vectors for expressing a polypeptide. Such vectors include 
chromosomal, nonchromosomal and synthetic DNA sequences. Large numbers of suitable 
vectors are known to those of skill in the art, and are commercially available. The following 
vectors are provided by way of example; Bacterial: pQE vectors (Qiagen), pBluescript 
plasmids, pNH vectors, (lambda-ZAP vectors (Stratagene); ptoc99a, pKK223-3, pDR540, 
pRTT2T (Pharmacia); Eukaryotic: pXTl, pSG5 (Stratagene), pSVK3, pBPV, pMSG, 
pSVLS V40 (Pharmacia). However, any other plasmid or other vector may be used as long as 
they are replicable and viable in the host. Low copy number or high copy number vectors 
may be employed with the present invention. 

A preferred type of vector for use in the present invention contains an f-factor origin 
replication. The f-factor (or fertility factor) in E. coli is a plasmid which effects high 
frequency transfer of itself during conjugation and less frequent transfer of the bacterial 
chromosome itself. A particularly preferred embodiment is to use cloning vectors, referred 
to as "fosmids" or bacterial artificial chromosome (BAC) vectors. These are derived from R 
coli f-factor which is able to stably integrate large segments of genomic DNA. When 
integrated with DNA from a mixed uncultured environmental sample, this makes it possible 
to achieve large genomic fragments in the form of a stable "environmental DNA library." 

Another preferred type of vector for use in the present invention is a cosmid vector. 
Cosmid vectors were originally designed to clone and propagate large segments of genomic 
DNA. Cloning into cosmid vectors is described in detail in "Molecular Cloning: A 
laboratory Manual" (Sambrook et al, 1989). 

The DNA sequence in the expression vector is operatively linked to an appropriate 
expression control sequenced) (promoter) to direct RNA synthesis. Particular named 
bacterial promoters include lacl, lacZ, T3, T7, gpt, lambda P R , P L and trp. Eukaryotic 
promoters include CMV immediate early, HSV thymidine kinase, early and late SV40, LTRs 
from retrovirus, and mouse metallothionein-L Selection of the appropriate vector and 
promoter is well within the level of ordinary skill in the art. The expression vector also 
contains a ribosome binding site for translation initiation and a transcription terminator. The 
vector may also include appropriate sequences for amplifying expression. Promoter regions 
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can be selected from any desired gene using CAT (chloramphenicol transferase) vectors or 
other vectors with selectable markers. 

In addition, the expression vectors preferably contain one or more selectable marker 
genes to provide a phenotypic trait for selection of transformed host cells such as 
5 dihydrofolate reductase or neomycin resistance for eukaryotic cell culture, or such as 
tetracycline or ampicillin resistance in E. colt 

Generally, recombinant expression vectors will include origins of replication and 
selectable markers permitting transformation of the host cell, e,g., the ampicillin resistance 
gene of E. coli and & cerevisiae TRP1 gene, and a promoter derived from a highly-expressed 

10 gene to direct transcription of a downstream structural sequence. Such promoters can be 
derived from operons encoding glycolytic enzymes such as 3-phosphoglycerate kinase 
(PGK), a-factor, acid phosphatase, or heat shock proteins, among others. The heterologous 
structural sequence is assembled in appropriate phase with translation initiation and 
termination sequences, and preferably, a leader sequence capable of directing secretion of 

1 5 translated protein into the periplasmic space or extracellular medium. 

The cloning strategy permits expression via both vector driven and endogenous 
promoters; vector promotion may be important with expression of genes whose endogenous 
promoter will not function in & colu 

The DNA isolated or derived from microorganisms can preferably be inserted into a 

20 vector or a plasmid prior to probing for selected DNA. Such vectors or plasmids are 
preferably those containing expression regulatory sequences, including promoters, enhancers 
and the like. Such polynucleotides can be part of a vector and/or a composition and still be 
isolated, in that such vector or composition is not part of its natural environment. 
Particularly preferred phage or plasmid and methods for introduction and packaging into 

25 them are described in detail in the protocol set forth herein. 

The selection of the cloning vector depends upon the approach taken, for example, 
the vector can be any cloning vector with an adequate capacity for multiply repeated copies 
of a sequence, or multiple sequences that can be successfully transformed and selected in a 
host cell. One example of such a vector is described in "Polycos vectors: a system for 

30 packaging filamentous phage and phagemid vectors using lambda phage packaging extracts" 
(Alting-Mecs and Short, 1993). Propagation/maintenance can be by an antibiotic resistance 
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carried by the cloning vector. After a period of growth, the naturally abbreviated molecules 
are recovered and identified by size fractionation on a gel or column, or amplified directly. 
The cloning vector utilized may contain a selectable gene that is disrupted by the insertion of 
the lengthy construct. As reductive reassortment progresses, the number of repeated units is 
reduced and the interrupted gene is again expressed and hence selection for the processed 
construct can be applied. The vector may be an expression/selection vector which will allow 
for the selection of an expressed product possessing desirable biologically properties. The 
insert may be positioned downstream of a functional promoter and the desirable property 
screened by appropriate means. 

In vivo reassortment is focused on "inter-moleculaf * processes collectively referred to 
as "recombination" which in bacteria, is generally viewed as a "RecA-dependenf * 
phenomenon. The present invention can rely on recombination processes of a host cell to 
recombine and re-assort sequences, or the cells' ability to mediate reductive processes to 
decrease the complexity of quasi-repeated sequences in the cell by deletion. This process of 
"reductive reassortment" occurs by an "intra-moleculaf*, RecA-independent process. 

Therefore, in another aspect of the present invention, novel polynucleotides can be 
generated by the process of reductive reassortment. The method involves the generation of 
constructs containing consecutive sequences (original encoding sequences), their insertion 
into an appropriate vector, and their subsequent introduction into an appropriate host cell. 
The reassortment of the individual molecular identities occurs by combinatorial processes 
between the consecutive sequences in the construct possessing regions of homology, or 
between quasi-repeated units. Hie reassortment process recombines and/or reduces the 
complexity and extent of the repeated sequences, and results in the production of novel 
molecular species. Various treatments may be applied to enhance the rate of reassortment. 
These could include treatment with ultra-violet light, or DNA damaging chemicals, and/or 
the use of host cell lines displaying enhanced levels of "genetic instability". Thus the 
reassortment process may involve homologous recombination or the natural property of 
quasi-repeated sequences to direct their own evolution. 

Repeated or "quasi-repeated" sequences play a role in genetic instability. In the 
present invention, "quasi-repeats" are repeats that are not restricted to their original unit 
structure. Quasi-repeated units can be presented as an array of sequences in a construct; 
consecutive units of similar sequences. Once ligated, the junctions between the consecutive 
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sequences become essentially invisible and the quasi-repetitive nature of the resulting 
construct is now continuous at the molecular level. Hie deletion process the cell performs to 
reduce the complexity of the resulting construct operates between the quasi-repeated 
sequences. The quasi-repeated units provide a practically limitless repertoire of templates 
upon which slippage events can occur. The constructs containing the quasi-repeats thus 
effectively provide sufficient molecular elasticity that deletion (and potentially insertion) 
events can occur virtually anywhere within the quasi-repetitive units. 

When the quasi-repeated sequences are all ligated in the same orientation, for 
instance head to tail or vice versa, the cell cannot distinguish individual units. Consequently, 
the reductive process can occur throughout the sequences. Jn contrast, when for example, the 
units are presented head to head, rather than head to tail, the inversion delineates the 
endpoints of the adjacent unit so that deletion formation will favor the loss of discrete units. 
Thus, it is preferable with the present method that the sequences are in the same orientation. 
Random orientation of quasi-repeated sequences will result in the loss of reassortment 
efficiency, while consistent orientation of the sequences will offer the highest efficiency. 
However, while having fewer of the contiguous sequences in the same orientation decreases 
the efficiency, it may still provide sufficient elasticity for the effective recovery of novel 
molecules. Constructs can be made with the quasi-repeated sequences in the same 
orientation to allow higher efficiency. 

Sequences can be assembled in a head to tail orientation using any of a variety of 
methods, including the following: 

a) Primers that include a poly-A head and poly-T tail which when made single- 
stranded would provide orientation can be utilized. Hiis is accomplished by 
having the first few bases of the primers made from RNA and hence easily 
removed RNAseH. 

b) Primers that include unique restriction cleavage sites can be utilized. Multiple 
sites, a battery of unique sequences, and repeated synthesis and ligation steps 
would be required. 

c) The inner few bases of the primer could be thiolated and an exonuclease used to 
produce properly tailed molecules. 
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The recovery of the re-assorted sequences relies on the identification of cloning 
vectors with a reduced RI. The re-assorted encoding sequences can then be recovered by 
amplification. The products are re-cloned and expressed. The recovery of cloning vectors 
with reduced RI can be effected by: 

1) The use of vectors only stably maintained when the construct is reduced in 
complexity. 

2) The physical recovery of shortened vectors by physical procedures. In this case, the 
cloning vector would be recovered using standard plasmid isolation procedures and 
size fractionated on either an agarose gel, or column with a low molecular weight cut 
off utilizing standard procedures. 

3) The recovery of vectors containing interrupted genes which can be selected when 
insert size decreases. 

4) The use of direct selection techniques with an expression vector and the appropriate 
selection. 

Encoding sequences (for example, genes) from related organisms may demonstrate a 
high degree of homology and encode quite diverse protein products. These types of 
sequences are particularly useful in the present invention as quasi-repeats. However, while 
the examples illustrated below demonstrate the reassortment of nearly identical original 
encoding sequences (quasi-repeats), this process is not limited to such nearly identical 
repeats. 

The following example demonstrates the method of the invention. Encoding nucleic 
acid sequences (quasi-repeats) derived from three (3) unique species are depicted. Each 
sequence encodes a protein with a distinct set of properties. Each of the sequences differs by 
a single or a few base pairs at a unique position in the sequence which are designated "A", 
"B" and "C*\ Hie quasi-repeated sequences are separately or collectively amplified and 
ligated into random assemblies such that all possible permutations and combinations are 
available in the population of ligated molecules. The number of quasi-repeat units can be 
controlled by the assembly conditions. The average number of quasi-repeated units in a 
construct is defined as the repetitive index (RI). 

Once formed, the constructs may, or may not be size fractionated on an agarose gel 
according to published protocols, inserted into a cloning vector, and transfected into an 
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appropriate host cell. The cells are then propagated and "reductive reassortment* * is effected. 
The rate of the reductive reassortment process may be stimulated by the introduction of DNA 
damage if desired. Whether the reduction in RI is mediated by deletion formation between 
repeated sequences by an "infra-molecular" mechanism, or mediated by recombination-like 
events through "inter-molecular" mechanisms is immaterial. The end result is a reassortment 
of the molecules into all possible combinations. 

Optionally, the method comprises the additional step of screening the library 
members of the shuffled pool to identify individual shuffled library members having the 
ability to bind or otherwise interact (e.g*, such as catalytic antibodies) with a predetermined 
macromolecule, such as for example a proteinaceous receptor, peptide oligosaccharide, viron, 
or other predetermined compound or structure. 

The displayed polypeptides, antibodies, peptidomimetic antibodies, and variable 
region sequences that are identified from such libraries can be used for therapeutic, 
diagnostic, research and related purposes (e.g., catalysts, solutes for increasing osmolality of 
an aqueous solution, and the like), and/or can be subjected to one or more additional cycles 
of shuffling and/or affinity selection- The method can be modified such that the step of 
selecting for a phenotypic characteristic can be other than of binding affinity for a 
predetermined molecule (e.g., for catalytic activity, stability oxidation resistance, drug 
resistance, or detectable phenotype conferred upon a host cell). 

The present invention provides a method for generating libraries of displayed 
antibodies suitable for affinity interactions screening. The method comprises (1) obtaining 
first a plurality of selected library members comprising a displayed antibody and an 
associated polynucleotide encoding said displayed antibody, and obtaining said associated 
polynucleotide encoding for said displayed antibody and obtaining said associated 
polynucleotides or copies thereof, wherein said associated polynucleotides comprise a region 
of substantially identical variable region framework sequence, and (2) introducing said 
polynucleotides into a suitable host cell and growing the cells under conditions which 
promote recombination and reductive reassortment resulting in shuffled polynucleotides. 
CDR combinations comprised by the shuffled pool are not present in the first plurality of 
selected library members, said shuffled pool composing a library of displayed antibodies 
comprising CDR permutations and suitable for affinity interaction screening. Optionally, the 
shuffled pool is subjected to affinity screening to select shuffled library members which bind 
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to a predetermined epitope (antigen) and thereby selecting a plurality of selected shuffled 
library members. Further, the plurality of selectively shuffled library members can be 
shuffled and screened iteratively, from 1 to about 1000 cycles or as desired until library 
members having a desired binding affinity are obtained. 
5 In another aspect of the invention, it is envisioned that prior to or during 

recombination or reassortment, polynucleotides generated by the method of the present 
invention can be subjected to agents or processes which promote the introduction of 
mutations into the original polynucleotides. The introduction of such mutations would | 

increase the diversity of resulting hybrid polynucleotides and polypeptides encoded j 

i 

10 therefrom. The agents or processes which promote mutagenesis can include, but are not 
limited to: (+)-CC-1065, or a synthetic analog such as (+)-CC-1065-(N3-Adenine, see Sun 
and Hurley, 1992); an N-acelylated or deacetylated 4Vfluro-4-aminobiphenyl adduct capable 
of inhibiting DNA synthesis (see, for example, van de Poll et al, 1992); or a N-acetylated or 
deacetylated 4-aminobiphenyl adduct capable of inhibiting DNA synthesis (see also, van de 

15 Poll et al, 1992, pp. 751-758); trivalent chromium, a trivalent chromium salt, a polycyclic 
aromatic hydrocarbon ("PAH") DNA adduct capable of inhibiting DNA replication, such as 

; 

7-bromomethyl-benz[a]anthra^ ("BMA"), tris(23-dibromopropyl)phosphate ("Tris-BP"), 
l,2-dibromo-3-chloropropane ("DBCP"), 2-bromoacrolein (2BA), benzo[a]pyrene-7,8- 
dihydrodiol-9-10-epoxide ("BPDE"), a platmum(II) halogen salt, N-hydroxy-2-amino-3- 
20 methylimidazo[4,5-/J-quinoline ("N-hydroxy-IQ"), and N-hydroxy-2-amino- 1 -methyl-6- . 
phenyliinidazo[4,5--/|-pyridine ("N-hydroxy-PhIP"). Especially preferred "means for slowing j 

i 

or halting PCR amplification consist of UV light (+)-CC-1065 and (+)-CC-106S-(N3- j 
Adenine). Particularly encompassed means are DNA adducts or polynucleotides comprising 

the DNA adducts from the polynucleotides or polynucleotides pool, which can be released or i 

i 

25 removed by a process including heating the solution comprising the polynucleotides prior to 
further processing. 

In another aspect the present invention is directed to a method of producing 

i 

recombinant proteins having biological activity by treating a sample comprising double- 
stranded template polynucleotides encoding a wild-type protein under conditions according 
• 30 to the present invention which provide for the production of hybrid or re-assorted 

l 

polynucleotides. 

■ 

* 
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The invention also provides the use of polynucleotide shuffling to shuffle a 
population of viral genes (e.g., capsid proteins, spike glycoproteins, polymerases, and 
proteases) or viral genomes (e.g., paramyxoviridae, orthomyxoviridae, herpesviruses, 
retroviruses, reoviruses and rhinoviruses). In an embodiment, the invention provides a 
5 method for shuffling sequences encoding all or portions of immunogenic viral proteins to 
generate novel combinations of epitopes as well as novel epitopes created by recombination; 
such shuffled viral proteins may comprise epitopes or combinations of epitopes as well as 
novel epitopes created by recombination; such shuffled viral proteins may comprise epitopes 
or combinations of epitopes which are likely to arise in the natural environment as a 

10 consequence of viral evolution; (e.g., such as recombination of influenza virus strains). 

The invention also provides a method suitable for shuffling polynucleotide sequences 
for generating gene therapy vectors and replication-defective gene therapy constructs, such as 
may be used for human gene therapy, including but not limited to vaccination vectors for 
DNA-based vaccination, as well as antineoplastic gene therapy and other general therapy 

15 formats. 

In the polypeptide notation used herein, the left-hand direction is the amino terminal 
direction and the right-hand direction is the carboxy-terminal direction, in accordance with 
standard usage and convention. Similarly, unless specified otherwise, the left-hand end of 
single-stranded polynucleotide sequences is the 5' end; the left-hand direction of double- 

20 stranded polynucleotide sequences is referred to as the 5* direction. The direction of 5' to 3 1 
addition of nascent RNA transcripts is referred to as the transcription direction; sequence 
regions on the DNA strand having the same sequence as the RNA and which are 5' to the 5* 
end of the RNA transcript are referred to as "upstream sequences"; sequence regions on the 
DNA strand having the same sequence as the RNA and which are 3* to the 3' end of the 

25 coding RNA transcript are referred to as "downstream sequences". 
Saturation Mutagenesis 

In one aspect, this invention provides for the use of proprietary codon primers 
(containing a degenerate N,N,G/T sequence) to introduce point mutations into a 
polynucleotide, so as to generate a set of progeny polypeptides in which a full range of single 

30 amino acid substitutions is represented at each amino acid position. The oligos used are 
comprised contiguously of a first homologous sequence, a degenerate N,N,G/T sequence, 
and preferably but not necessarily a second homologous sequence. The downstream progeny 
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translation^ products from the use of such oligos include all possible amino acid changes at 
each amino acid site along the polypeptide, because the degeneracy of the N,N,G/T sequence 
includes codons for all 20 amino acids. 

In one aspect, one such degenerate oligo (comprised of one degenerate NJNf,G7T 
cassette) is used for subjecting each original codon in a parental polynucleotide template to a 
full range of codon substitutions. In another aspect, at least two degenerate N,N,G/T 
cassettes are used - either in the same oligo or not, for subjecting at least two original codons 
in a parental polynucleotide template to a full range of codon substitutions. Thus, more than 
one N ,N,G/T sequence can be contained in one oligo to introduce amino acid mutations at 
more than one site. This plurality of N,N,G/T sequences can be directly contiguous, or 
separated by one or more additional nucleotide sequence(s). In another aspect, oligos 
serviceable for introducing additions and deletions can be used either alone or in combination 
with the codons containing an N,N,G/T sequence, to introduce any combination or 
permutation of amino acid additions, deletions, and/or substitutions. 

In a particular exemplification, it is possible to simultaneously mutagenize two or 
more contiguous amino acid positions using an oligo that contains contiguous N,N,G/T 
triplets, Le. a degenerate (N,N,G/T)n sequence. 

In another aspect, the present invention provides for the use of degenerate cassettes 
having less degeneracy than the N,N,G/T sequence. For example, it may be desirable in 
some instances to use (e.g. in an oligo) a degenerate triplet sequence comprised of only one 
N, where said N can be in the first second or third position of the triplet. Any other bases 
including any combinations and permutations thereof can be used in the remaining two 
positions of the triplet Alternatively, it may be desirable in some instances to use (e.g. in an 
oligo) a degenerate N,N,N triplet sequence, or an N,N, G/C triplet sequence. 

It is appreciated, however, that the use of a degenerate triplet (such as NJN,G/T or an 
N,N, G/C triplet sequence) as disclosed in the instant invention is advantageous for several 
reasons. In one aspect, this invention provides a means to systematically and fairly easily 
generate the substitution of the full range of possible amino acids (for a total of 20 amino 
acids) into each and every amino acid position in a polypeptide. Thus, for a 100 amino acid 
polypeptide, the instant invention provides a way to systematically and fairly easily generate 
2000 distinct species (i.e. 20 possible amino acids per position X 100 amino acid positions). 
It is appreciated that there is provided, through the use of an oligo containing a degenerate 
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N,N,G/T or an NJN, G/C triplet sequence, 32 individual sequences that code for 20 possible 
amino acids. Thus, in a reaction vessel in which a parental polynucleotide sequence is 
subjected to saturation mutagenesis using one such oligo, there are generated 32 distinct 
progeny polynucleotides encoding 20 distinct polypeptides. In contrast, the use of a non- 
degenerate oligo in site-directed mutagenesis leads to only one progeny polypeptide product 
per reaction vessel. 

This invention also provides for the use of nondegenerate oligos, which can 
optionally be used in combination with degenerate primers disclosed. It is appreciated that in 
some situations, it is advantageous to use nondegenerate oligos to generate specific point 
mutations in a working polynucleotide. This provides a means to generate specific silent 
point mutations, point mutations leading to corresponding amino acid changes, and point 
mutations that cause the generation of stop codons and the corresponding expression of 
polypeptide fragments. 

Thus, in a preferred embodiment of this invention, each saturation mutagenesis 
reaction vessel contains polynucleotides encoding at least 20 progeny polypeptide molecules 
such that all 20 amino acids are represented at the one specific amino acid position 
corresponding to the codon position mutagenized in the parental polynucleotide. The 32-fold 
degenerate progeny polypeptides generated from each saturation mutagenesis reaction vessel 
can be subjected to clonal amplification (e.g. cloned into a suitable E. coli host using an 
expression vector) and subjected to expression screening. When an individual progeny 
polypeptide is identified by screening to display a favorable change in property (when 
compared to the parental polypeptide), it can be sequenced to identify the correspondingly 
favorable amino acid substitution contained therein. 

It is appreciated that upon mutagenizing each and every amino acid position in a 
parental polypeptide using saturation mutagenesis as disclosed herein, favorable amino acid 
changes may be identified at more than one amino acid position. One or more new progeny 
molecules can be generated that contain a combination of all or part of these favorable amino 
acid substitutions. For example, if 2 specific favorable amino acid changes are identified in 
each of 3 amino acid positions in a polypeptide, the permutations include 3 possibilities at 
each position (no change from the original amino acid, and each of two favorable changes) 
and 3 positions. Thus, there are 3 x 3 x 3 or 27 total possibilities, including 7 that were 
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previously examined - 6 single point mutations (i.e. 2 at each of three positions) and no 
change at any position. 

In yet another aspect, site-saturation mutagenesis can be used together with shuffling, 
chimerization, recombination and other mutagenizing processes, along with screening. This 
invention provides for the use of any mutagenizing process(es), including saturation 
mutagenesis, in an iterative manner. In one exemplification, the iterative use of any 
mutagenizing process(es) is used in combination with screening. 

Thus, in a non-limiting exemplification, this invention provides for the use of 
saturation mutagenesis in combination with additional mutagenization processes, such as 
process where two or more related polynucleotides are introduced into a suitable host cell 
such that a hybrid polynucleotide is generated by recombination mid reductive reassortment. 

In addition to performing mutagenesis along the entire sequence of a gene, the instant 
invention provides that mutagenesis can be use to replace each of any number of bases in a 
polynucleotide sequence, wherein the number of bases to be mutagenized is preferably every 
integer from 15 to 100,000. Thus, instead of mutagenizing every position along a molecule, 
one can subject every a discrete number of bases (preferably a subset totaling from 15 to 
100,000) to mutagenesis. Preferably, a separate nucleotide is used for mutagenizing each 
position or group of positions along a polynucleotide sequence. A group of 3 positions to be 
mutagenized may be a codon. The mutations are preferably introduced using a mutagenic 
primer, containing a heterologous cassette, also referred to as a mutagenic cassette. Preferred 
cassettes can have from 1 to 500 bases. Each nucleotide position in such heterologous 
cassettes be N, A, C, G, T, A/C, A/G, A/T, C/G, OT, G/T, C/G/T, A/G/T, A/C/T, A/C/G, or 
E, where E is any base that is not A, C, G, or T (E can be referred to as a designer oligo). 
The tables below show exemplary tri-nucleotide cassettes (there are over 3000 possibilities in 
addition to N,N,G/T and NJST,N and N,N,A/C). 

In a general sense, saturation mutagenesis is comprised of mutagenizing a complete 
set of mutagenic cassettes (wherein each cassette is preferably 1-500 bases in length) in 
defined polynucleotide sequence to be mutagenized (wherein the sequence to be mutagenized 
is preferably from 15 to 100,000 bases in length). Thusly, a group of mutations (ranging 
from 1 to 100 mutations) is introduced into each cassette to be mutagenized. A grouping of 
mutations to be introduced into one cassette can be different or the same from a second 
grouping of mutations to be introduced into a second cassette during the application of one 
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round of saturation mutagenesis. Such groupings are exemplified by deletions, additions, 
groupings of particular codons, and groupings of particular nucleotide cassettes. 

Defined sequences to be mutagenized (see Fig. 20) include preferably a whole gene, 
pathway, cDNA, an entire open reading frame (ORF), and intire promoter, enhancer, 
5 repressoi/transactivator, origin of replication, intron, operator, or any polynucleotide 
functional group. Generally, a preferred "defined sequences" for this purpose may be any 
polynucleotide that a 15 base-polynucleotide sequence, and polynucleotide sequences of 
lengths between 15 bases and 15,000 bases (this invention specifically names every integer in 
between). Considerations in choosing groupings of codons include types of amino acids 
1 o encoded by a degenerate mutagenic cassette. 

In a particularly preferred exemplification a grouping of mutations that can be 
introduced into a mutagenic cassette (see Tables 1-85), this invention specifically provides 
for degenerate codon substitutions (using degenerate oligos) that code for 2, 3, 4, 5, 6, 7, 8, 9, 
10, 11, 12, 13, 14, 15, 16, 17, 18, 19, and 20 amino acids at each position, and a library of 
1 5 polypeptides encoded thereby. 

Chimerizations 
"Shuffling" 

Nucleic acid shuffling is a method for in vitro or in vivo homologous recombination 
20 of pools of shorter or smaller polynucleotides to produce a polynucleotide or 
polynucleotides. Mixtures of related nucleic acid sequences or polynucleotides are subjected 
to sexual PCR to provide random polynucleotides, and reassembled to yield a library or 
mixed population of recombinant hybrid nucleic acid molecules or polynucleotides. 

In contrast to cassette mutagenesis, only shuffling and error-prone PCR allow one to 
25 mutate a pool of sequences blindly (without sequence information other than primers). 

The advantage of the mutagenic shuffling of this invention over error-prone PCR 
alone for repeated selection can best be explained with an example from antibody 
engineering. Consider DNA shuffling as compared with enor-prone PCR (not sexual PCR). 
The initial library of selected pooled sequences can consist of related sequences of diverse 
30 origin (i.e. antibodies from naive mRNA) or can be derived by any type of mutagenesis 
(including shuffling) of a single antibody gene. A collection of selected complementarity 
determining regions ("CDRs") is obtained after the first round of affinity selection. In the 
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diagram the thick CDRs confer onto the antibody molecule increased affinity for the antigen. 
Shuffling allows the free combinatorial association of all of the CDRls with all of the 
CDR2s with all of the CDR3s, for example. 

This method differs from error-prone PCR, in that it is an inverse chain reaction. In 
error-prone PCR, the number of polymerase start sites arid the number of molecules grows 
exponentially. However, the sequence of the polymerase start sites and the sequence of the 
molecules remains essentially the same. In contrast, in nucleic acid reassembly or shuffling 
of random polynucleotides the number of start sites and the number (but not size) of the 
random polynucleotides decreases over time. For polynucleotides derived from whole 

* 

plasmids the theoretical endpoint is a single, large concatemeric molecule. 

Since cross-overs occur at regions of homology, recombination will primarily occur 
between members of the same sequence family. This discourages combinations of CDRs 
that are grossly incompatible (e.g., directed against different epitopes of the same antigen). It 
is contemplated that multiple families of sequences can be shuffled in the same reaction. 
Further, shuffling generally conserves the relative order, such that, for example, CDR1 will 
not be found in the position of CDR2. 

Rare shufflants will contain a large number of the best .(eg. highest affinity) CDRs 
and these rare shufflants may be selected based on their superior affinity. 

CDRs from a pool of 100 different selected antibody sequences can be permutated in 
up to 1006 different ways. This large number of permutations cannot be represented in a 
single library of DNA sequences. Accordingly, it is contemplated that multiple cycles of 
DNA shuffling and selection may be required depending on the length of the sequence and 
the sequence diversity desired. 

Error-prone PCR, in contrast, keeps all the selected CDRs in the same relative 
sequence, generating a much smaller mutant cloud. 

The template polynucleotide which may be used in the methods of this invention may 
be DNA or RNA. It may be of various lengths depending on the size of the gene or shorter 
or smaller polynucleotide to be recombined or reassembled. Preferably, the template 
polynucleotide is from 50 bp to 50 kb. It is contemplated that entire vectors containing the 
nucleic acid encoding the protein of interest can be used in the methods of this invention, and 
in fact have been successfully used. 
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The template polynucleotide may be obtained by amplification using the PCR 
reaction (USPN 4,683,202 and USPN 4,683,195) or other amplification or cloning methods. 
However, the removal of free primers from the PCR products before subjecting them to 
pooling of the PCR products and sexual PCR may provide more efficient results. Failure to 
adequately remove the primers from the original pool before sexual PCR can lead to a low 
frequency of crossover clones. 

The template polynucleotide often should be double-stranded. A double-stranded 
nucleic acid molecule is recommended to ensure that regions of the resulting single-stranded 
polynucleotides are complementary to each other and thus can hybridize to form a 
double-stranded molecule. 

It is contemplated that single-stranded or double-stranded nucleic acid 
polynucleotides having regions of identity to the template polynucleotide and regions of 
heterology to the template polynucleotide may be added to the template polynucleotide, at 
this step. It is also contemplated that two different but related polynucleotide templates can 
be mixed at this step. 

The double-stranded polynucleotide template and any added double-or 
single-stranded polynucleotides are subjected to sexual PGR which includes slowing or 
halting to provide a mixture of from about 5 bp to 5 kb or more. Preferably the size of the 
random polynucleotides is from about 10 bp to 1000 bp, more preferably the size of the 
polynucleotides is from about 20 bp to 500 bp. 

Alternatively, it is also contemplated that double-stranded nucleic acid having 
multiple nicks may be used in the methods of this invention. A nick is a break in one strand 
of the double-stranded nucleic acid. The distance between such nicks is preferably 5 bp to 5 
kb, more preferably between 10 bp to 1000 bp. This can provide areas of self-priming to 
produce shorter or smaller polynucleotides to be included with the polynucleotides resulting 
from random primers, for example. 

The concentration of any one specific polynucleotide will not be greater than 1% by 
weight of the total polynucleotides, more preferably the concentration of any one specific 
nucleic acid sequence will not be greater than 0.1% by weight of the total nucleic acid. 

The number of different specific polynucletides in the mixture will be at least about 
100, preferably at least about 500, and more preferably at least about 1000. 
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At this step single-stranded or double-stranded polynucleotides, either synthetic or | 
natural, may be added to the random double-stranded shorter or smaller polynucleotides in 
order to increase the heterogeneity of the mixture of polynucleotides. 

It is also contemplated that populations of double-stranded randomly broken 
5 polynucleotides may be mixed or combined at this step with the polynucleotides from the 
sexual PCR process and optionally subjected to one or more additional sexual PCR cycles. 

Where insertion of mutations into the template polynucleotide is desired, 
single-stranded or double-stranded polynucleotides having a region of identity to the 
template polynucleotide and a region of heterology to the template polynucleotide may be 
10 added in a 20 fold excess by weight as compared to the total nucleic acid, more preferably 
the single-stranded polynucleotides may be added in a 10 fold excess by weight as compared 
to the total nucleic acid. 

I 

Where a mixture of different but related template polynucleotides is desired, 
populations of polynucleotides from each of the templates may be combined at a ratio of less 

15 than about 1:100, more preferably the ratio is less than about 1:40. For example, a backcross 

of the wild-type polynucleotide with a population of mutated polynucleotide may be desired j 
to eliminate neutral mutations (e.g., mutations yielding an insubstantial alteration in the 
phenotypic property being selected for). In such an example, the ratio of randomly provided 
wild-type polynucleotides which may be added to the randomly provided sexual PCR cycle | 

20 hybrid polynucleotides is approximately 1:1 to about 100:1, and more preferably from 1:1 to 
40:1. 

! 

J 

The mixed population of random polynucleotides are denatured to form 
single-stranded polynucleotides and then re-annealed. Only those single-stranded 
polynucleotides having regions of homology with other single-stranded polynucleotides will 
25 re-anneal. 

The random polynucleotides may be denatured by heating. One skilled in the art j 
could determine the conditions necessary to completely denature the double-stranded nucleic 
acid. Preferably the temperature is from 80 °C to 100 °C, more preferably the temperature is 

i 

from 90 °C to 96 °C. other methods which may be used to denature the polynucleotides 
30 include pressure (36) and pH. 
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The polynucleotides may be re-annealed by cooling. Preferably the temperature is 
from 20 °C to 75 °C, more preferably the temperature is from 40 °C to 65 °C. If a high 
frequency of crossovers is needed based on an average of only 4 consecutive bases of 
homology, recombination can be forced by using a low annealing temperature, although the 
process becomes more difficult The degree of renaturation which occurs will depend on the 
degree of homology between the population of single-stranded polynucleotides. 

Renaturation can be accelerated by the addition of polyethylene glycol ("PEG") or 
salt The salt concentration is preferably from 0 mM to 200 mM, more preferably the salt 
concentration is from 10 mM to 100 mm The salt may be KC1 or NaCl. The concentration 
of PEG is preferably from 0% to 20%, more preferably from 5% to 10%. 

The annealed polynucleotides are next incubated in the presence of a nucleic acid 
polymerase and dNTFs (i.e. dATP, dCTP, DGTP and dTTP). The nucleic acid polymerase 
may be the Klenow fragment, the Taq polymerase or any other DNA polymerase known in 
the art. 

The approach to be used for the assembly depends on the minimum degree of 
homology that should still yield crossovers, if the areas of identity are large, Taq polymerase 
can be used with an annealing temperature of between 45-65 °C. If the areas of identity are 
small, Klenow polymerase can be used with an annealing temperature of between 20-30 °C. 
One skilled in the art could vary the temperature of annealing to increase the number of 
cross-overs achieved. 

The polymerase may be added to the random polynucleotides prior to annealing, 
simultaneously with annealing or after annealing. 

The cycle of denaturation, renaturation and incubation in the presence of polymerase 
is referred to herein as shuffling or reassembly of the nucleic acid This cycle is repeated for 
a desired number of times. Preferably the cycle is repeated from 2 to 50 times, more 
preferably the sequence is repeated from 10 to 40 times. 

The resulting nucleic acid is a larger double-stranded polynucleotide of from about 50 
bp to about 100 kb, preferably the larger polynucleotide is from 500 bp to 50 kb. 

This larger polynucleotides may contain a number of copies of a polynucleotide 
having the same size as the template polynucleotide in tandem. This concatemeric 
polynucleotide is then denatured into single copies of the template polynucleotide. The result 
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will be a population of polynucleotides of approximately the same size as the template 
polynucleotide. The population will be a mixed population where single or double-stranded 
polynucleotides having an area of identity and an area of heterology have been added to the 
template polynucleotide prior to shuffling. These polynucleotides are then cloned into the 
5 appropriate vector and the ligation mixture used to transform bacteria. 

It is contemplated that the single polynucleotides may be obtained from the larger 
concatemeric polynucleotide by amplification of the single polynucleotide prior to cloning by 
a variety of methods including PGR (USPN 4,683,195 and USPN 4,683,202), rather than by j 
digestion of the concatemer. 
10 The vector used for cloning is not critical provided that it will accept a polynucleotide 

of the desired size. If expression of the particular polynucleotide is desired, the cloning 

» * 

vehicle should further comprise transcription and translation signals next to the site of 
insertion of the polynucleotide to allow expression of the polynucleotide in the host cell. 
Preferred vectors include the pUC series and the pBR series of plasmids. 
15 The resulting bacterial population will include a number of recombinant 

polynucleotides having random mutations. This mixed population may be tested to identify 
the desired recombinant polynucleotides. The method of selection wiD depend on the j 
polynucleotide desired. 

> 

For example, if a polynucleotide which encodes a protein with increased binding j j 

i 

20 efficiency to a ligand is desired, the proteins expressed by each of the portions of the 
polynucleotides in the population or library may be tested for their ability to bind to the 
ligand by methods known in the art (i.e. panning, affinity chromatography). If a 

I £ 

polynucleotide which encodes for a protein with increased drug resistance is desired, the 
proteins expressed by each of the polynucleotides in the population or library may be tested 
25 for their ability to confer drug resistance to the host organism. One skilled in the art, given J 
knowledge of the desired protein, could readily test the population to identify 
polynucleotides which confer the desired properties onto the protein. 

i 

It is contemplated that one skilled in the art could use a phage display system in j 

■ * » 

which fragments of the protein are expressed as fusion proteins on the phage surface 
30 (Pharmacia, Milwaukee WI). Hie recombinant DNA molecules are cloned into the phage 
DNA at a site which results in the transcription of a fusion protein a portion of which is 
encoded by the recombinant DNA molecule. The phage containing the recombinant nucleic 
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acid molecule undergoes replication and transcription in the cell. The leader sequence of the 
fusion protein directs the transport of the fusion protein to the tip of the phage particle. Thus 
the fusion protein which is partially encoded by the recombinant DNA molecule is displayed 
on the phage particle for detection and selection by the methods described above. 

5 It is further contemplated that a number of cycles of nucleic acid shuffling may be 

conducted with polynucleotides from a sub-population of the first population, which sub- 
population contains DNA encoding the desired recombinant protein. In this manner, proteins 
with even higher binding affinities or enzymatic activity could be achieved. 

It is also contemplated that a number of cycles of nucleic acid shuffling may be 

10 conducted with a mixture of wild-type polynucleotides and a sub-population of nucleic acid 
from the first or subsequent rounds of nucleic acid shuffling in order to remove any silent 
mutations from the sub-population. 

Any source of nucleic acid, in purified form can be utilized as the starting nucleic 
acid. Thus the process may employ DNA or RNA including messenger RNA, which DNA 

15 or RNA may be single or double stranded. In addition, a DNA-RNA hybrid which contains 
one strand of each may be utilized. The nucleic acid sequence may be of various lengths 
depending on the size of the nucleic acid sequence to be mutated. Preferably the specific 
nucleic acid sequence is from 50 to 50000 base pairs. It is contemplated that entire vectors 
containing the nucleic acid encoding the protein of interest may be used in the methods of 

20 this invention. 

The nucleic acid may be obtained from any source, for example, from plasmids such 
a pBR322, from cloned DNA or RNA or from natural DNA or RNA from any source 
including bacteria, yeast, viruses and higher organisms such as plants or animals. DNA or 
RNA may be extracted from blood or tissue material. The template polynucleotide may be 

25 obtained by amplification using the polynucleotide chain reaction (PCR, see USPN 
4,683,202 and USPN 4,683,195). Alternatively, the polynucleotide may be present in a 
vector present in a cell and sufficient nucleic acid may be obtained by culturing the cell and 
extracting the nucleic acid from the cell by methods known in the art. 

Any specific nucleic acid sequence can be used to produce the population of hybrids 

30 by the present process. It is only necessary that a small population of hybrid sequences of the 
specific nucleic acid sequence exist or be created prior to the present process. 
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The initial small population of the specific nucleic acid sequences having mutations 
may be created by a number of different methods. Mutations may be created by error-prone 
PCR. Error-prone PCR uses low-fidelity polymerization conditions to introduce a low level 
of point mutations randomly over a long sequence. Alternatively, mutations can be 
5 introduced into the template polynucleotide by ohgonucleotide-directed mutagenesis. In 
oligonucleotide-directed mutagenesis, a short sequence of the polynucleotide is removed 
from the polynucleotide using restriction enzyme digestion and is replaced with a synthetic 
polynucleotide in which various bases have been altered from the original sequence. The 
polynucleotide sequence can also be altered by chemical mutagenesis. Chemical mutagens 

10 include, for example, sodium bisulfite, nitrous acid, hydroxylamine, hydrazine or formic 
acid, other agents which are analogues of nucleotide precursors include nitrosoguanidine, 
5-bromouracil, 2-aminopurine, or acridine. Generally, these agents are added to the PCR 
reaction in place of the nucleotide precursor thereby mutating the sequence. Intercalating 
agents such as proflavine, acriflavine, quinacrine and the like can also be used. Random 

15 mutagenesis of the polynucleotide sequence can also be achieved by irradiation with X-rays 
or ultraviolet light. Generally, plasmid polynucleotides so mutagenized are introduced into 
K coli and propagated as a pool or library of hybrid plasmids. 

Alternatively die small mixed population of specific nucleic acids may be found in 
nature in that they may consist of different alleles of the same gene or the same gene from 

20 different related species (i.e., cognate genes). Alternatively, they may be related DNA 
sequences found within one species, for example, the immunoglobulin genes. 

Once the mixed population of the specific nucleic acid sequences is generated, the 
polynucleotides can be used directly or inserted into an appropriate cloning vector, using 
techniques well-known in,the art. 

25 The choice of vector depends on the size of the polynucleotide sequence and the host 

cell to be employed in the methods of this invention. The templates of this invention may be 
plasmids, phages, cosmids, phagemids, viruses (e.g., retroviruses, parainfluenzavirus, 
herpesviruses, reoviruses, paramyxoviruses, and the like), or selected portions thereof (e.g., 
coat protein, spike glycoprotein, capsid protein). For example, cosmids and phagemids are 

30 preferred where the specific nucleic acid sequence to be mutated is larger because these 
vectors are able to stably propagate large polynucleotides. 
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If the mixed population of the specific nucleic acid sequence is cloned into a vector it 
can be clonally amplified by inserting each vector into a host cell and allowing the host cell 
to amplify the vector. This is referred to as clonal amplification because while fee absolute 
number of nucleic acid sequences increases, the number of hybrids does not increase. Utility 
can be readily determined by screening expressed polypeptides. 

The DNA shuffling method of this invention can be performed blindly on a pool of 
unknown sequences. By adding to fee reassembly mixture oligonucleotides (with ends that 
are homologous to the sequences being reassembled) any sequence mixture can be 
incorporated at any specific position into another sequence mixture. Thus, it is contemplated 
that mixtures of synthetic oligonucleotides, PCR polynucleotides or even whole genes can be 
mixed into another sequence library at defined positions. The insertion of one sequence 
(mixture) is independent from the insertion of a sequence in another part of the template. 
Thus, fee degree of recombination, the homology required, and the diversity of fee library 
can be independently and simultaneously varied along the length of the reassembled DNA. 

This approach of mixing two genes may be useful for the humanization of antibodies 
from murine hybridomas. The approach of mixing two genes or inserting alternative 
sequences into genes may be useful for any therapeutically used protein, for example, 
interleukin I, antibodies, tPA and growth hormone. The approach may also be useful in any 
nucleic acid for example, promoters or introns or 31 untranslated region or 51 untranslated 
regions of genes to increase expression or alter specificity of expression of proteins. The 
approach may also be used to mutate ribozymes or aptamers. 

Shuffling requires the presence of homologous regions separating regions of 
diversity. Scaffold-like protein structures may be particularly suitable for shuffling. The 
conserved scaffold determines fee overall folding by self-association, while displaying 
relatively unrestricted loops that mediate the specific binding. Examples of such scaffolds 
are the immunoglobulin beta-barrel, and the four-helix bundle which are well-known in the 
art. This shuffling can be used to create scaffold-like proteins with various combinations of 
mutated sequences for binding. 
In vitro Shuffljng 

The equivalents of some standard genetic matings may also be performed by 
shuffling in vitro. For example, a "molecular backcross" can be performed by repeatedly 
mixing the hybrid's nucleic acid wife the wild-type nucleic acid while selecting for the 
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mutations of interest. As in traditional breeding, this approach can be used to combine 
phenotypes from different sources into a background of choice. It is useful, for example, for 
the removal of neutral mutations that affect unselected characteristics (i.e. immunogenicity). 
Thus it can be useful to determine which mutations in a protein are involved in the enhanced 
biological activity and which are not, an advantage which cannot be achieved by error-prone 
mutagenesis or cassette mutagenesis methods. 

Large, functional genes can be assembled correctly from a mixture of small random 
polynucleotides. This reaction may be of use for the reassembly of genes from the highly 
fragmented DNA of fossils. In addition random nucleic acid fragments from fossils may be 
combined with polynucleotides from similar genes from related species. 

It is also contemplated that the method of this invention can be used for the in vitro 
amplification of a whole genome from a single cell as is needed for a variety of research and 
diagnostic applications. DNA amplification by PCR is in practice limited to a length of 
about 40 kb. Amplification of a whole genome such as that of E. coli (5, 000 kb) by PCR 
would require about 250 primers yielding 125 forty kb polynucleotides. This approach is not 
practical due to the unavailability of sufficient sequence data. On the other hand, random 
production of polynucleotides of the genome with sexual PCR cycles, followed by gel 
purification of small polynucleotides will provide a multitude of possible primers. Use of 
this mix of random small polynucleotides as primers in a PCR reaction alone or with the 
whole genome as the template should result in an inverse chain reaction with the theoretical 
endpoint of a single concatamer containing many copies of the genome. 

100 fold amplification in the copy number and an average polynucleotide size of 
greater than 50 kb may be obtained when only random polynucleotides are used. It is 
thought that the larger concatamer is generated by overlap of many smaller polynucleotides. 
The quality of specific PCR products obtained using synthetic primers will be 
indistinguishable from the product obtained from unamplified DNA. It is expected that this 
approach will be useful for the mapping of genomes. 

The polynucleotide to be shuffled can be produced as random or non-random 

polynucleotides, at the discretion of the practitioner. Moreover, this invention provides a 

method of shuffling that is applicable to a wide range of polynucleotide sizes and types, 

including the step of generating polynucleotide monomers to be used as building blocks in 

the reassembly of a larger polynucleotide. For example, the building blocks can be 
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fragments of genes or they can be comprised of entire genes or gene pathways, or any 
combination thereof. 
In Vivo Shuffling 

In an embodiment of in vivo shuffling, the mixed population of the specific nucleic 
5 acid sequence is introduced into bacterial or eukaryotic cells under conditions such that at 
least two different nucleic acid sequences ate present in each host cell. The polynucleotides 
can be introduced into the host cells by a variety of different methods. The host cells can be 
transformed with the smaller polynucleotides using methods known in the art, for example 
treatment with calcium chloride* If the polynucleotides are inserted into a phage genome, the 
10 host cell can be transfected with the recombinant phage genome having the specific nucleic 
acid sequences. Alternatively, the nucleic acid sequences can be introduced into the host cell 
using electroporation, transfection, lipofection, biolistics, conjugation, and the like. 

In general, in this embodiment, the specific nucleic acids sequences will be present in 
vectors which are capable of stably replicating the sequence in the host cell. In addition, it is 
15 contemplated that the vectors will encode a marker gene such that host cells having the 
vector can be selected. This ensures that the mutated specific nucleic acid sequence can be 
recovered after introduction into the host cell. However, it is contemplated that the entire 
mixed population of the specific nucleic acid sequences need not be present on a vector 
sequence. Rather only a sufficient number of sequences need be cloned into vectors to 
20 ensure that after introduction of the polynucleotides into the host cells each host cell contains 
one vector having at least one specific nucleic acid sequence present therein. It is also 
contemplated that rather than having a subset of the population of the specific nucleic acids 
sequences cloned into vectors, this subset may be already stably integrated into the host cell. 
It has been found that when two polynucleotides which have regions of identity are 
25 inserted into the host cells homologous recombination occurs between the two 
polynucleotides. Such recombination between the two mutated specific nucleic acid 
sequences will result in the production of double or triple hybrids in some situations. 

It has also been found that the frequency of recombination is increased if some of the 
mutated specific nucleic acid sequences are present on linear nucleic acid molecules. In one 
30 aspect, some of the specific nucleic acid sequences are present on linear polynucleotides. 

After transformation, the host cell transformants are placed under selection to identify 
those host cell transformants which contain mutated specific nucleic acid sequences having 



WO 02/092780 



PCT/US02/15767 



the qualities desired. For example, if increased resistance to a particular drug is desired then 
the transformed host cells may be subjected to increased concentrations of the particular drug 
and those teansformants producing mutated proteins able to confer increased drug resistance 
will be selected If the enhanced ability of a particular protein to bind to a receptor is desired, 

5 then expression of the protein can be induced from the transfonnante and the resulting 
protein assayed in a ligand binding assay by methods known in the art to identify that subset 
of the mutated population which shows enhanced binding to the ligand Alternatively, the 
protein can be expressed in another system to ensure proper processing. 

Once a subset of the first recombined specific nucleic acid sequences (daughter 

10 sequences) having the desired characteristics are identified they are then subject to a second 
round of recombination. 

In the second cycle of recombination, the recombined specific nucleic acid sequences 
may be mixed with the original mutated specific nucleic acid sequences (parent sequences) 
and the cycle repeated as described above, hi this way a set of second recombined specific 

15 nucleic acids sequences can be identified which have enhanced characteristics or encode for 
proteins having enhanced properties. This cycle can be repeated a number of times as 
desired 

It is also contemplated that in the second or subsequent recombination cycle, a 
backcross can be performed. A molecular backcross can be performed by mixing the desired 

20 specific nucleic acid sequences with a large number of the wild-type sequence, such that at 
least one wild-type nucleic acid sequence and a mutated nucleic acid sequence are present in 
the same host cell after transformation. Recombination with the wild-type specific nucleic 
acid sequence will eliminate those neutral mutations that may affect unselected 
characteristics such as immunogenicity but not the selected characteristics. 

25 In another embodiment of this invention, it is contemplated that during the first round 

a subset of the specific nucleic acid sequences can be generated as smaller polynucleotides 
by slowing or halting their PCR amplification prior to introduction into the host cell. The 
size of the polynucleotides must be large enough to contain some regions of identity with the 
other sequences so as to homologously recombine with the other sequences. The size of the 

30 polynucleotides will range from 0.03 kb to 100 kb more preferably from 0. 2 kb to 10 kb. It 
is also contemplated that in subsequent rounds, all of the specific nucleic acid sequences 
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other than the sequences selected from the previous round may be utilized to generate PCR 
polynucleotides prior to introduction into the host cells. 

The shorter polynucleotide sequences can be single-stranded or double-stranded. If 
the sequences were originally single-stranded and have become double-stranded they can be 
denatured with heat, chemicals or enzymes prior to insertion into the host cell. The reaction 
conditions suitable for separating the strands of nucleic acid are well known in the art. 

The steps of this process can be repeated indefinitely, being limited only by the 
number of possible hybrids which can be achieved. After a certain number of cycles, all 
possible hybrids will have been achieved and further cycles are redundant 

In an embodiment the same mutated template nucleic acid is repeatedly recombined 
and the resulting recombinants selected for the desired characteristic. 

Therefore, the initial pool or population of mutated template nucleic acid is cloned 
into a vector capable of replicating in a bacteria such as £ coli. The particular vector is not 
essential, so long as it is capable of autonomous replication in E. coli. In a preferred 
embodiment, the vector is designed to allow the expression and production of any protein 
encoded by the mutated specific nucleic acid linked to the vector. It is also preferred that the 
vector contain a gene encoding for a selectable marker. 

The population of vectors containing the pool of mutated nucleic acid sequences is 
introduced into the £ coli host cells. The vector nucleic acid sequences may be introduced 
by transformation, transfection or infection in the case of phage. The concentration of 
vectors used to transform the bacteria is such that a number of vectors is introduced into each 
cell. Once present in the cell, the efficiency of homologous recombination is such that 
homologous recombination occurs between the various vectors. This results in the 
generation of hybrids (daughters) having a combination of mutations which differ from the 
original parent mutated sequences. 

The host cells are then clonally replicated and selected for the marker gene present on 
the vector. Only those cells having a plasmid will grow under the selection. 

The host cells which contain a vector are then tested for the presence of favorable 
mutations. Such testing may consist of placing the cells under selective pressure, for 
example, if the gene to be selected is an improved drug resistance gene. If the vector allows 
expression of the protein encoded by the mutated nucleic acid sequence, then such selection 
may include allowing expression of the protein so encoded, isolation of the protein and 

443 



WO 02/092780 PCT/US02/15767 



testing of the protein to determine whether, for example, it binds with increased efficiency to 
the ligand of interest. 

Once a particular daughter mutated nucleic acid sequence has been identified which 
confers the desired characteristics, the nucleic acid is isolated either already linked to the 
s vector or separated from the vector. This nucleic acid is then mixed with the first or parent 
population of nucleic acids and the cycle is repeated. 

It has been shown that by this method nucleic acid sequences having enhanced 
desired properties can be selected. 

In an alternate embodiment, the first generation of hybrids are retained in the cells 
10 and the parental mutated sequences are added again to the cells. Accordingly, the first cycle 
of Embodiment I is conducted as described above. However, after the daughter nucleic acid 
sequences are identified, the host cells containing these sequences are retained. 

The parent mutated specific nucleic acid population, either as polynucleotides or 
cloned into the same vector is introduced into the host cells already containing the daughter 
15 nucleic acids. Recombination is allowed to occur in the cells and the next generation of 
recombinants, or granddaughters are selected by the methods described above. 

This cycle can be repeated a number of times until the nucleic acid or peptide having 
the desired characteristics is obtained. It is contemplated that in subsequent cycles, the 
population of mutated sequences which are added to the preferred hybrids may come from 
20 the parental hybrids or any subsequent generation. 

In an alternative embodiment, the invention provides a method of conducting a 
"molecular" backcross of the obtained recombinant specific nucleic acid in order to eliminate 
any neutral mutations. Neutral mutations are those mutations which do not confer onto the 
nucleic acid or peptide the desired properties. Such mutations may however confer on the 
25 nucleic acid or peptide undesirable characteristics. Accordingly, it is desirable to eliminate 
such neutral mutations. The method of this invention provide a means of doing so. 

In this embodiment, after the hybrid nucleic acid, having the desired characteristics, is 
obtained by the methods of the embodiments, the nucleic acid, the vector having the nucleic 
acid or the host cell containing the vector and nucleic acid is isolated. 
30 The nucleic acid or vector is then introduced into the host cell with a large excess of 

the wild-type nucleic acid. The nucleic acid of the hybrid and the nucleic acid of the 
wild-type sequence are allowed to recombine. The resulting recombinants are placed under 
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the same selection as the hybrid nucleic acid. Only those recombinants which retained the 
desired characteristics will be selected. Any silent mutations which do not provide the 
desired characteristics will be lost through recombination with the wild-type DNA. This 
cycle can be repealed a number of times until all of the silent mutations are eliminated. 
5 Thus the methods of this invention can be used in a molecular backcross to eliminate 

unnecessary or silent mutations. 
Exonuclease-Mediated Reassembly 

In a particular embodiment, this invention provides for a method for shuffling, 
assembling, reassembling, recombining, &/or concatenating at least two polynucleotides to 

10 form a progeny polynucleotide (e.g. a chimeric progeny polynucleotide that can be expressed 
to produce a polypeptide or a gene pathway). In a particular embodiment, a double stranded 
polynucleotide end (e.g. two single stranded sequences hybridized to each other as 
hybridization partners) is treated with an exonuclease to liberate nucleotides from one of the 
two strands, leaving the remaining strand free of its original partner so that, if desired, the 

1 5 remaining strand may be used to achieve hybridization to another partner. 

In a particular aspect, a double stranded polynucleotide end (that may be part of - or 
connected to - a polynucleotide or a nonpolynucleotide sequence) is subjected to a source of 
exonuclease activity. Serviceable sources of exonuclease activity may be an enzyme with 3' 
exonuclease activity, an enzyme with 5' exonuclease activity, an enzyme with both 3* 

20 exonuclease activity and 5' exonuclease activity, and any combination thereof. An 
exonuclease can be used to liberate nucleotides from one or both ends of a linear double 
stranded polynucleotide, and from one to all ends of a branched polynucleotide having more 
than two ends. The mechanism of action of this liberation is believed to be comprised of an 
enzymatically-catalyzed hydrolysis of terminal nucleotides, and can be allowed to proceed in 

25 a time-dependent fashion, allowing experimental control of the progression of the enzymatic 
process. 

By contrast, a non-enzymatic step may be used to shuffle, assemble, reassemble, 
recombine, and/or concatenate polynucleotide building blocks that is comprised of subjecting 
a working sample to denaturing (or "melting") conditions (for example, by changing 
30 temperature, pH, and /or salinity conditions) so as to melt a working set of double stranded 
polynucleotides into single polynucleotide strands. For shuffling, it is desirable that the 
single polynucleotide strands participate to some extent in annealment with different 

445 



WO 02/092780 



PCT/US02/15767 



hybridization partners (i.e. and not merely revert to exclusive leannealment between what 
were former partners before the denaturation step). The presence of the former hybridization 
partners in the reaction vessel, however, does not preclude, and may sometimes even favor, 
reannealment of a single stranded polynucleotide with its former partner, to recreate an 
5 original double stranded polynucleotide. 

In contrast to this non-enzymatic shuffling step comprised of subjecting double 
stranded polynucleotide building blocks to denaturation, followed by annealment, the instant 
invention further provides an exonuclease-based approach requiring no denaturation - rather, j 
the avoidance of denaturing conditions and the maintenance of double stranded 

10 polynucleotide substrates in annealed (i.e. non-denatured) state are necessary conditions for 

the action of exonucleases (e.g., exonuclease HI and red alpha gene product). Additionally in j 
contrast, the generation of single stranded polynucleotide sequences capable of hybridizing to ! 
other single stranded polynucleotide sequences is the result of covalent cleavage - and hence 
sequence destruction - in one of the hybridization partners. For example, an exonuclease III 

15 enzyme may be used to enzymatically liberate 3* terminal nucleotides in one hybridization 

strand (to achieve covalent hydrolysis in that polynucleotide strand); and this favors | 
hybridization of the remaining single strand to a new partner (since its former partner was 
subjected to covalent cleavage). 

By way of further illustration, a specific exonuclease, namely exonuclease III is 

20 provided herein as an example of a 3' exonuclease; however, other exonucleases may also be 

used, including enzymes with 5' exonuclease activity and enzymes with 3* exonuclease j 
activity, and including enzymes not yet discovered and enzymes not yet developed. It is 
particularly appreciated that enzymes can be discovered, optimized (e.g. engineered by j 
directed evolution), or both discovered and optimized specifically for the instantly disclosed 

25 approach that have more optimal rates &/or more highly specific activities Sclox greater lack 
of unwanted activities. In fact it is expected that the instant invention may encourage the 
discovery &/or development of such designer enzymes. In sum, this invention may be 
practiced with a variety of currently available exonuclease enzymes, as well enzymes not yet 
discovered and enzymes not yet developed. 

30 The exonuclease action of exonuclease HI requires a working double stranded 

polynucleotide end that is either blunt or has a 5' overhang, and the exonuclease action is L 
comprised of enzymatically liberating 3' terminal nucleotides, leaving a single stranded 5' 
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end that becomes longer and longer as the exonuclease action proceeds (see Figure 1). Any 
5' overhangs produced by this approach may be used to hybridize to another single stranded 
polynucleotide sequence (which may also be a single stranded polynucleotide or a terminal 
overhang of a partially double stranded polynucleotide) that shares enough homology to 
5 allow hybridization. Hie ability of these exonuclease Hi-generated single stranded sequences 
(e.g. in 5' overhangs), to hybridize to other single stranded sequences allows two or more 
polynucleotides to be shuffled, assembled, reassembled, &Jot concatenated. 

Furthermore, it is appreciated that one can protect the end of a double stranded 
polynucleotide or render it susceptible to a desired enzymatic action of a serviceable 

10 exonuclease as necessary* For example, a double stranded polynucleotide end having a 3' 
overhang is not susceptible to the exonuclease action of exonuclease HI. However, it may be 
rendered susceptible to the exonuclease action of exonuclease III by a variety of means; for 
example, it may be blunted by treatment with a polymerase, cleaved to provide a blunt end or 
a 5' overhang, joined (ligated or hybridized) to another double stranded polynucleotide to 

15 provide a blunt end or a 5' overhang, hybridized to a single stranded polynucleotide to 
provide a blunt end or a 5* overhang, or modified by any of a variety of means). 

According to one aspect, an exonuclease may be allowed to act on one or on both 
ends of a linear double stranded polynucleotide and proceed to completion, to near 
completion, or to partial completion. When the exonuclease action is allowed to go to 

20 completion, the result will be that the length of each 5* overhang will be extend far towards 
the middle region of the polynucleotide in the direction of what might be considered a 
"rendezvous point" (which may be somewhere near the polynucleotide midpoint). 
Ultimately, this results in the production of single stranded polynucleotides (that can become 

* 

dissociated) that are each about half the length of the original double stranded polynucleotide 
25 (see Figure 1), Alternatively, an exonuclease-mediated reaction can be terminated before 

proceeding to completion. 

Thus this exonuclease-mediated approach is serviceable for shuffling, assembling 

&/or reassembling, recombining, and concatenating polynucleotide building blocks, which 

polynucleotide building blocks can be up to ten bases long or tens of bases long or hundreds 
30 of bases long or thousands of bases long or tens of thousands of bases long or hundreds of 

thousands of bases long or millions of bases long or even longer. 
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This exonuclease-mediated approach is based on the action of double stranded DNA 
specific exodeoxyribonuclease activity of E. coli exonuclease m. Substrates for exonuclease 
IH may be generated by subjecting a double stranded polynucleotide to fragmentation. 
Fragmentation may be achieved by mechanical means (e.g., shearing, sonication, etc.)* by 
5 enzymatic means (e.g. using restriction enzymes), and by any combination thereof. 
Fragments of a larger polynucleotide may also be generated by polymerase-mediated 
synthesis. 

Exonuclease HI is a 28K monomelic enzyme, product of the xthA gene of E. coli with 
four known activities: exodeoxyribonuclease (alternatively referred to as exonuclease 

10 herein), RNaseH, DNA-3*-phosphatase, and AP endonuclease. The exodeoxyribonuclease 
activity is specific for double stranded DNA. The mechanism of action is thought to involve 
enzymatic hydrolysis of DNA from a 3 9 end progressively towards a 5* direction, with 
formation of nucleoside 5*-phosphates and a residual single strand. The enzyme does not 
display efficient hydrolysis of single stranded DNA, single-stranded RNA, or double- 

15 stranded RNA; however it degrades RNA in an DNA-RNA hybrid releasing nucleoside 5'- 
phosphates. The enzyme also releases inorganic phosphate specifically from 
3'phosphomonoester groups on DNA, but not from RNA or short oligonucleotides. Removal 
of these groups converts the terminus into a primer for DNA polymerase action. 

Additional examples of enzymes with exonuclease activity include red-alpha and 

20 venom phosphodiesterases. Red alpha {redd) gene product (also referred to as lambda 
exonuclease) is of bacteriophage X origin. The reda gene is transcribed from the leftward 
promoter and its product is involved (24 kD) in recombination. Red alpha gene product acts 
processively from 5'-phosphorylated termini to liberate mononucleotides from duplex DNA 
(Takahashi & Kobayashi, 1990). Venom phosphodiesterases (Laskowski, 1980) is capable 

25 of rapidly opening supercoiled DNA. 
Non-Stochastic Ligation Reassembly 

In one aspect, the present invention provides a non-stochastic method termed 
synthetic ligation reassembly (SLR), that is somewhat related to stochastic shuffling, save 
that the nucleic acid building blocks are not shuffled or concatenated or chimerized 

30 randomly, but rather are assembled non-stochastically. 
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A particularly glaring difference is that the instant SLR method does not depend on 

* j 

the presence of a high level of homology between polynucleotides to be shuffled. In j 
contrast, prior methods, particularly prior stochastic shuffling methods require that presence 
of a high level of homology, particularly at coupling sites, between polynucleotides to be 
5 shuffled. Accordingly these prior methods favor the regeneration of the original progenitor 

I i 

molecules, and are suboptimal for generating large numbers of novel progeny chimeras, j | 

particularly full-length progenies. The instant invention, on the other hand, can be used to 

100 

non-stochastically generate libraries (or sets) of progeny molecules comprised of over 10 j 
different chimeras. Conceivably, SLR can even be used to generate libraries comprised of 

r 

10 over 10 1000 different progeny chimeras with (no upper limit in sight). 

Thus, in one aspect, the present invention provides a method, which method is non- 
stochastic, of producing a set of finalized chimeric nucleic acid molecules having an overall 
assembly order that is chosen by design, which method is comprised of the steps of 
generating by design a plurality of specific nucleic acid building blocks having serviceable 

15 mutually compatible ligatable ends, and assembling these nucleic acid building blocks, such 
that a designed overall assembly order is achieved. 

The mutually compatible ligatable ends of the nucleic acid building blocks to be 
assembled are considered to be "serviceable" for this type of ordered assembly if they enable 
the building blocks to be coupled in predetermined orders. Thus, in one aspect, the overall j ! 

20 assembly order in which the nucleic acid building blocks can be coupled is specified by the 

design of the ligatable ends and, if more than one assembly step is to be used, then the overall j 

i 

assembly order in which the nucleic acid building blocks can be coupled is also specified by ; ; 

the sequential order of the assembly step(s). Figure 4, Panel C illustrates an exemplary | | 

assembly process comprised of 2 sequential steps to achieve a designed (non-stochastic) 
25 overall assembly order for five nucleic acid building blocks. In a preferred embodiment of j 

i 

this invention, the annealed building pieces are treated with an enzyme, such as a ligase (e.g. 

i 

T4 DNA ligase), achieve covalent bonding of the building pieces. 

In one aspect, the design of nucleic acid building blocks is obtained upon analysis of j. j 

the sequences of a set of progenitor nucleic acid templates that serve as a basis for producing j 
30 a progeny set of finalized chimeric nucleic acid molecules. These progenitor nucleic acid j j 

templates thus serve as a source of sequence information that aids in the design of the nucleic j 
acid building blocks that are to be mutagenized, i.e. chimerized or shuffled. 
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In one exemplification, this invention provides for the chimerization of a family of 
related genes and their encoded family of related products. In a particular exemplification, 
the encoded products are enzymes. As a representative list of families of enzymes which 
may be mutagenized in accordance with the aspects of the present invention, there may be 
mentioned, the following enzymes and their functions: 



1 Lipase/Esterase 

a. Enantioselective hydrolysis of esters (lipids)/ thioesters 
1) Resolution of racemic mixtures 

10 2) Synthesis of optically active adds or alcohols from /w&MJ-diesters 

b. Selective syntheses 

1) Regiospecific hydrolysis of carbohydrate esters 

2) Selective hydrolysis of cyclic secondary alcohols 

c. Synthesis of optically active esters, lactones, acids, alcohols 
1 5 1) Transesterification of activated/nonactivated esters 

2) Ihteresterification 

3) Optically active lactones from hydroxyesters 

4) Regio- and enantioselective ring opening of anhydrides 

d. Detergents 

20 e. Fat/Oil conversion 

f . Cheese ripening 

2 Protease 

a. Ester/amide synthesis 

b. Peptide synthesis 

25 c. Resolution of racemic mixtures of amino acid esters 

d. Synthesis of non-natural amino acids 

e. Detergents/protein hydrolysis 

3 Glycosidase/Glycosyl transferase 
a. Sugar/polymer synthesis 

30 b. Cleavage of glycosidic linkages to form mono, di-and oligosaccharides 

c. Synthesis of complex oligosaccharides 

d. Glycoside synthesis using UDP-galactosyl transferase 

e. Transglycosylation of disaccharides, glycosyl fluorides, aryl galactosides 
£ Glycosyl transfer in oligosaccharide synthesis 

35 g. Diastereoselective cleavage of p-glucosylsulfoxides 

h. Asymmetric glycosylations 

i. Food processing 
j. Paper processing 

4 Phosphatase/Kinase 

40 a. Synthesis/hydrolysis of phosphate esters 

1) Regio-, enantioselective phosphorylation 

2) Introduction of phosphate esters 

3) Synthesize phospholipid precursors 

4) Controlled polynucleotide synthesis 
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